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CONSERVED AND SPECIFIC STREPTOCOCCAL GENOMES 

FIELD OF THE INVENTION 

The invention relates to polynucleotides which are conserved or specific to 
one or more species of Streptococcus, Streptococcus species serotypes, and/or 
serotype isolates. The conserved or specific genomic regions can be used to identify, 
screen and develop vaccines and other treatments for Streptococcal infections and can 
be used in diagnostic assays to diagnose and identify Streptococcal infections. 

BACKGROUND OF THE INVENTION 

The genus Streptococcus consists of Gram-positive, chain-forming, spherical 
bacterial cells. Three species of clinical interest are S.pnewnoniae ("pneumococcus" 
or "S.pn."), S.pyogenes ('group A streptococcus* or 'GAS') and S.agalactiae ('group 
B streptococcus' or 'GBS')- Infections with these three pathogenic streptococci lead 
to conditions including pharyngitis, toxic shock syndrome and necrotizing fasciitis. 

Once thought to infect only cows, GBS is now known to cause serious disease, 
bacteraemia and meningitis in immunocompromised individuals and neonates. There 
are two known types of neonatal infection. The first (early onset, usually within 5 
days of birth) is manifested by bacteraemia and infection. It is generally contracted 
vertically as a baby passes through the birth canal. GBS is thought to colonize the 
vagina of about 25% of young women; approximately 1% of infants born via a 
vaginal birth to colonised mothers will become infected. Mortality resulting from 
these infections is between 50 - 70%. The second type of neonatal infection is a 
meningitis that occurs 10 to 60 days after birth. If pregnant women are vaccinated 
with type m capsule so that the infants are passively immunised, the incidence of the 
late onset meningitis is generally reduced, although not entirely eliminated. 

The "B" in "GBS" refers to the Lancefield classification, which is based on 
the antigenicity of a carbohydrate which is soluble in dilute acid and called the C 
carbohydrate. Lancefield identified 13 types of C carbohydrate, designated A to O, 
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that could be serologically differentiated. The organisms that most commonly infect 
humans are found in groups A, B, D, and G. Within group B, strains can be divided 
into at least 9 serotypes (la, lb, II, ffl, IV, V, VI, VII, and Vffl) based on the structure 
of their polysaccharide capsule. Further categories based on, for example, the 
expression of certain proteins have also been developed. 

GBS strains of polysaccharide capsule Type V were rarely isolated before the 
mid-1 980's but now account for approximately one-third of clinical isolates in the US. 
Type V is the most common capsular serotype associated with invasive infection in 
nonpregnant adults, and the emergence of Type V strain over the past decade has been 
temporarily linked to an increase in GBS disease in this population. 

Group A streptococcus is a frequent human pathogen, estimated to be present 
in between 5 — 15% of normal individuals without signs of disease. When host 
defences are compromised, or when the organism is able to exert its virulence, or 
when it is introduced into vulnerable tissues or hosts, however, an acute infection 
occurs. Diseases include puerperal fever, scarlet fever, erysipelas, pharyngitis, 
impetigo, necrotising fasciitis, myositis and streptococcal toxic shock syndrome. 

Pneumococcus is the most common cause of acute respiratory infection and 
otitis media and is estimated to result in over 3 million deaths in children every year 
worldwide from pneumonia, bacteremia, or meningitis. Even more deaths occur 
among elderly people, among whom S. pn. is the leading cause of community- 
acquired pneumonia and meningitis. Since 1990, the number of penicillin-resistant 
strains has increased from 1 to 5% to 25 to 80% of isolates, and many strains are now 
resistant to commonly prescribed antibiotics such as penicillin, macrolides, and 
fluoroquinolones. See Tettelin, et al. (2001) Science 293, 248-506. 

The complete genomic sequence of a virulent isolate of S. pneumoniae was 
published by Tettelin, et al. (2001) Science 293, 248-506 and is available at the TIGR 
website at http://www.tigr.com. The genomic sequence, the Tettelin article and its 
published supplemental material are incorporated herein by reference in their entirety. 
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The complete genomic sequence of an Ml strain of S. pyrogenes was 
published by Ferretti, et al. (2001) Proc. Natl Acad. ScL USA 98, 4658 - 4663 and is 
available at the TIGR website at http://www.tigr.com. The genomic sequence, the 
Ferretti article and its published supplemental materials are incorporated herein by 
reference in their entirety. 

The complete genomic sequence of a serotype V strain of S. agalactiae (type V 
strain 2603 V/R) is published on the date of this filing, August 26, 2002 by Tettelin, et 
aL (2002) Proc. Natl. Acad. Sci. USA and/or is available on the same day at the TIGR 
website at http://www.tigr.com. Most of this sequence is also availabe in PCT 
International Patent Application Publication WO 02/34771 . The genomic sequence, 
the Tettelin article and its published supplemental materials are incorporated herein 
by reference in their entirety. 

Current treatments for Streptococcal infections include both antibiotics and 
prophylactic vaccination. Current vaccines, particularly with respect to GBS, suffer 
from poor immunogenicity, while the emergence of antibiotic resistant strains has 
lessened the effectiveness of currently used antibiotics. Accordingly, there is an 
increasing need for the development of new vaccines and antibiotics (as well as other 
small molecule bacterial inhibitors) to help prevent and treat Streptococcal infections. 

Applicants have identified regions of the Streptococcal genomes which can be 
used to identify and develop new vaccines and treatments for Streptococcal infections. 
Specifically, Applicants have identified polynucleotides of the Streptococcal genome 
which are conserved or specific to Streptococcal species, species serotypes, and/or 
specific serotype isolates. These polynucleotides and their expressed polypeptides 
can be used to screen, develop and design new vaccines, antibiotics and other small 
molecule bacterial inhibitors. These polynucleotides and their expressed polypeptides 
can further be used to diagnose and identify Steptococcal infections. 
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SUMMARY OF THE INVENTION 



The invention relates to polynucleotides which are conserved or specific to 
one or more species of Streptococcus, Streptococcus species serotypes, and/or 
serotype isolates. In particular, the invention relates to polynucleotides from 
Streptococcus which are conserved or specific to one or more of the species of S. 
pneumoniae ("pneumococcus" or "S. pn "), S. pyogenes ("group A streptococcus" or 
"GAS"), and S. agalactiae ("group B streptococcus" or "GBS"). The invention 
further relates to polynucleotides which are conserved or specific to one or more 
Streptococcal species serotypes, such as GBS serotypes la, lb, n, m, IV, V, VI, VII, 
and VBDL The invention still further relates to polynucleotides which are conserved or 
specific to one or more clinical isolates of a Streptococcus species. 

The invention is based on the identification of the following Subsets of genes. 
Genes falling within each subset are described with respect to referenced tables, lists, 
and/or figures (in particular the CGH map depicted in Figure 1). 

The following Subsets related to the GBS genome: 

GBS Subset 1: 1060 GBS genes which have homologs with GAS and with 
pneumococcus (Table 8); 

GBS Subset 2: 225 GBS genes which have homologues with GAS, but not 
with pneumococcus (Table 10); 

GBS Subset 3: 176 GBS genes which have homologues with pneumococcus 
but not with GAS (Table 9); 

GBS Subset 4: 683 GBS genes which do not have homologues with GAS or 
pneumococcus (specific to GBS vs GAS and pneumococcus) (Table 11). 

The invention is based on the identification of the following subsets of genes 
within the GAS genome: 

GAS Subset 1: 1006 GAS genes which have homologues with GBS and with 
pneumococcus (Table 33); 

GAS Subset 2: 212 GAS genes which have homologues with GBS but do not 
have homologues with pneumococcus (Table 34); 
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GAS Subset 3: 62 GAS genes which have homologues with pneumococcus 
but do not have homologues with GBS (Table 35); 

GAS Subset 4: 416 GAS genes which do not have homologues with either 
GBS or pneumococcus. This Subset can be determined by subtracting the above 
subsets from the published genome. 

The invention is based on the identification of the following subsets of genes 
within the pneumococcus genome: 

Spn Subset 1: 1034 Spn genes which have homologues with GBS and GAS 
(Table 36); 

Spn Subset 2: 195 Spn genes which have homologues with GBS but do not 
have homologues with GAS (Table 37); 

Spn Subset 3: 74 Spn genes which have homologues with GAS but do not 
have homologues with GBS (Table 38); 

Spn Subset 4: 836 Spn genes which do not have homologues with either GBS 
or pneumococcus. This Subset can be determined by substracting the above Subsets 
from the published genome. 

The invention further provides polynucleotides which are conserved or 
specific to Streptococcus based on a comparison with a wide range of published 
bacterial genomes. The following additional Subsets are provided: 

GBS Subset 1(a): Of the 1060 GBS genes which have homologues in both 
GAS and pneumococcus, 12 of those GBS genes do not have homologues with any of 
the other published bacterial genomes at the time of the invention (i.e„ GBS Subset 
1(a) is specific to Streptococcus vs non Streptococcus published genomes). (The 12 
GBS ORFs are listed in Table 3). 

GBS Subset 2(a): This Subset comprises GBS genes which have homologues 
with GAS, but not with pneumococcus or any other published bacterial genomes at 
the time of the invention. 
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GBS Subset 3(a): This Subset comprises GBS genes which have homologues 
with pneumococcus, but not with GAS or any other published bacteria] genomes at 
the time of the invention. 

GBS Subset 4(a): Of the 683 GBS genes which do not have homologues in 
either GAS or pnuemococcus, 315 of these GBS genes also do not have homologues 
with any of the other published bacterial genomes. These include six proteins 
predicted to be anchored on the cell wall (SAG0677, SAG0771, SAG1052, SAG1331, 
SAG1473, and SAG1 168), three of the capsule-related genes (SAG1 163, SAG1 167, 
and SAG1168), six transcriptional regulators, and four genes of the cyl operon 
(SAG0663 - SAG0673) essential for GBS hemolytic activity and production of 
pigment See Pritzlaff et al. (2001) MoL Microbiol, 39, 236 - 247. The rest of the 
315 proteins include 240 hypothetical proteins with no similarity to other proteins in 
databases. 

Many of the 315 genes specific to S. agalactiae are located in regions likely to 
constitute mobile genetic elements. Two of these regions resemble prophages 
(SAG0545-SAG0610 and SAG1835-SAG1885) displaying a mosaic structure with 
segments most similar to different bacteriophages, a pattern that suggests frequent 
recombination events. PblA and PblB are adhesins from a S. mitis prophage where 
they contribute to endocarditis by binding to human platelets (See Bensing, et al. 
(2001) Infect Immuru 69, 6186 - 6192; Bensing, et al (2001) Infect Immuau 69, 1373 
— 1380. Their orthologs in £ agalactiae are located on separate prophages and 
display a different protein structure. Another region (SAG1247-SAG1299) encodes a 
putative conjugative transposon that carries genes for cadmium efflux and mercury 
resistance. 

GAS Subset 1(a): This Subset comprises GAS genes which have homologues 
with GBS and with pneumococcus, but do not have homologues with any of the other 
published bacterial genomes at the time of the invention. 
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GAS Subset 2(a): This Subset comprises GAS genes which have homologues 
with GBS but do not have homologues with pneumococcus or any of the other 
published bacterial genomes at the time of the invention; 

GAS Subset 3(a): This Subset comprises GAS genes which have homologues 
with pneumococcus but do not have homologues with GBS or any of the other 
published bacterial genomes at the time of the invention. 

GAS Subset 4(a): This Subset comprises GAS genes which do not have 
homologues with either GBS or pneumococcus or with any of the other published 
bacterial genomes at the time of the invention. 

Spn Subset 1(a): This Subset comprises Spn genes which have homologues 
with GBS and GAS but which do not have homologues with any of the other 
published bacterial genomes at the time of the invention; 

Spn Subset 2(a): This Subset comprises Spn genes which have homologues 
with GBS but do not have homologues with GAS or with any of the other published 
bacterial genomes at the time of the invention; 

Spn Subset 3(a): This Subset comprises Spn genes which have homologues 
with GAS but do not have homologues with GBS or with any of the other published 
bacterial genomes at the time of the invention; 

Spn Subset 4(a): This Subset comprises Spn genes which do not have 
homologues with either GBS or pneumococcus or with any of the other published 
bacterial genomes at the time of the invention. 

The invention also provides polynucleotides which are conserved or specific 
to GBS serotypes and/or clinical isolates. Applicants have sequenced 19 GBS genes 
from a variety of GBS serotypes in 1 1 different clinical isolates. The sequences of 
these genes are set forth in Tables 13-31. The following additional subsets are 
provided: 

GBS Subset 1(b): of the 1060 GBS genes which have homologues with GAS 
and with pneumococcus, 47 of these GBS genes vary among the 11 clinical isolates. 
1013 of these GBS genes are conserved across the 1 1 clinical isolates. This list can 
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be determined by comparing the genes listed in Table 8 with the Comparative 
Genome Hybridization in Figure 1. 

GBS Subset 2(b): of the 225 GBS genes which have homologues with GAS, 
but not pneumococcus, 44 of these GBS genes vary among the 1 1 clinical isolates. 
181 of these GBS genes are conserved across the 11 clinical isolates. This list can be 
determined by comparing the genes listed in Table 10 with the Comparative Genome 
Hybridization in Figure 1. 

GBS Subset 3(b): of the 176 GBS genes which have homologues with 
pneumococcus, 44 of these GBS genes vary among 1 1 clinical isolates. 132 of these 
GBS genes are conserved across the 1 1 clinical isolates. This list can be determined 
by comparing the genes listed in Table 9 with the Comparative Genome Hybridization 
in Figure 1. 

GBS Subset 4(b): of the 683 GBS genes which do not have homologues with 
GAS or pneumococcus, 260 GBS genes vary among the 1 1 clinical isolates. 423 of 
these GBS genes are conserved across the 11 clinical isolates. This list can be 
determined by comparing the genes listed in Table 11 with the Comparative Genome 
Hybridization in Figure 1 . 

The invention further provides polynucleotides which are likely recent 
genomic duplications in GBS. These duplications include glycosyl transferases, 
sortases, proteins anchored on the cell wall, B lactam resistance factors, and many 
hypothetic proteins. The GBS genes are listed in Table 4 (GBS Subset 5). 

The invention is also based on the identification of a cluster of 13 adjacent 
genes (SAG1410 - SAG1424) which is believed to encode enzymes required for 
synthesis of the group B carbohydrate, a coplex multiantennary structure of rhamnose, 
glucitol phosphate, N-acetylglucosamine, and galactose. (GBS Subset 6). Predicted 
proteins encoded within this cluster include seven putative glycoslytransferases, four 
of which are similar to rhamnosyltransferases in other streptococcal species; a 
putative dTDP-L-rhamnose synthase; and proteins involved in glucitol synthesis. All 
nine regonized GBS capsular polysaccharide types contain sialic acid residues as part 
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of their repeating unit structure, a feature that contributes to virulence by inhibiting 
activation of the alternative complement pathway. See Edwards et al. (1982) 7. 
Immunol 128, 1278 - 1283. 

The type V capsular polysaccharide gene cluster consists of 18 genes. (6BS 
Subset 6(a)). A region of glycosyltransferases and related proteins (SAG1 162 - 
SAG1 170) that direct the synthesis of the type V polysaccharide repeat unit is flanked 
on either side by genes that are conserved in all known GBS capsule serotypes. 
Downstream of this region are genes that encode enzynmes for the biosynthesis and 
activation of sialic acid (SAG1 158 - SAG1 161). Upstream of the serotype specific 
region are genes (SAG1 171 - SAG1 175) found not only in all nine GBS capsular 
serotypes but also in a variety of other polysaccharide-producing streptococci. 

The invention provides for methods of screening a Streptococcal genome for a 
conserved or a specific genomic sequence using one or more of the subsets of the 
invention. * 

The invention further provides for an immunogenic composition comprising a 
polypeptide expressed by one or more of the polynucleotides in one or more of the 
subsets of the invention, and methods for designing an immunogenic composition by 
selecting one or more polypeptides expressed by one or more of the polynucleotides 
in one or more of the subsets of the invention. 

The invention further provides for methods of screening compounds for 
activity against a Streptococcal bacteria, which method comprises contacting the 
compounds with a polypeptide expressed by the polynucleotide from one of the 
subsets of the invention. 

The invention further provides for compositions comprising one or more of 
the polynucleotides, and fragments and derivatives thereof, selected from the group 
consisting of the sequences set forth in Tables 13-31. 

The invention further provides for compositions comprising polypeptides and 
fragments and derivatives thereof encoded by the polynucleotides set forth in Tables 



13-31. 
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BRIEF DESCRIPTION OF THE TABLES AND DRAWINGS 

Table 1 comprises a complete list of GBS predicted genes, listed by S AGxxxx 
ORF number. This table also includes the predicted amino acid size of the predicted 
expressed protein and the predicted function, if known. 

Table 2 comprises a list of predicted and experimentally characterized surface 
and secreted proteins from GBS. 

Table 3 lists genes which were shared among GBS, GAS and pneumococcus, 
but which were not found in any of the other completely sequenced genomes. 

Table 4 depicts probable recently duplicated genes within GBS. 

Table 5 lists the 19 GBS strains used for comparative genome hybridisations 
and phylogenetic analysis. 

Table 6 lists clusters of genes derived from phylogenetic profiling of GBS 
strains based on comparative genome hybridisations. 

Table 7 lists genes and strains used for phylogenetic analyses of the 19 GBS 

strains. 

Table 8 lists the 1060 GBS ORFs which are conserved across GBS, GAS and 
pneumococcus. 

Table 9 lists the 176 GBS ORFs which are conserved across GBS and 
pneumococcus. 

Table 10 lists the 225 GBS ORF's which are conserved across GBS and GAS. 
Table 11 lists 683 GBS ORF's which are not shared with either GBS or 
pneumococcus. 

Table 12 lists 315 GBS ORF's which are not shared with any published 
genomic sequence. 

Table 13 lists the sequences of the 1 1 strains relating to GBS ORF SAG0466. 
Table 14 lists the sequences of the 1 1 strains relating to GBS ORF SAG0471. 
Table 15 lists the sequences of the 1 1 strains relating to GBS ORF SAG0492. 
Table 16 lists the sequences of the 11 strains relating to GBS ORF SAG0767. 
Table 17 lists the sequences of the 1 1 strains relating to GBS ORF SAG1086. 
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Table 18 lists the sequences of the 11 strains relating to GBS ORF SAG1600. 

Table 19 lists the sequences of the 11 strains relating to GBS ORF SAG1680. 

Table 20 lists the sequences of the 11 strains relating to GBS ORF SAG1723. 

Table 21 lists the sequences of the 1 1 strains relating to GBS ORF SAG0079. 

Table 22 lists the sequences of the 1 1 strains relating to GBS ORF SAG0093. 

Table 23 lists the sequences of the 11 strains relating to GBS ORF SAG0163. 

Table 24 lists the sequences of the 1 1 strains relating to GBS ORF SAG0290. 

Table 25 lists the sequences of the 1 1 strains relating to GBS ORF SAG0368. 

Table 26 lists the sequences of the 11 strains relating to GBS ORF SAG0503. 

Table 27 lists the sequences of the 1 1 strains relating to GBS ORF SAG1473. 

Table 28 lists the sequences of the 1 1 strains relating to GBS ORF SAG1552. 

Table 29 lists the sequences of the 11 strains relating to GBS ORF SAG1641. 

Table 30 lists the sequences of the 11 strains relating to GBS ORF SAG2147. 

Table 31 lists the sequences of the 1 1 strains relating to GBS ORF SAG2148. 

Table 32 provides a conversion table for the ORFxxxx reference numbers to 
the SAGxxxx reference numbers, which are available at the TIGR website on the day 
of the filing of this application. 

Table 33 lists the 1006 GAS ORFs which are shared with GBS and Spn. The 
genes corresponding to these ORFs were published in GenBank; the numbers for the 
GAS ORF refer directly to their GenBank entries. 

Table 34 lists the 212 GAS ORFs which are shared with GBS but which do 
not have homologues with pneumococcus. The genes corresponding to these ORFs 
were published in GenBank; the numbers for the GAS ORF refer directly to their 
GenBank entries. 

Table 35 lists the 62 GAS ORF's which have homologues with pneumococcus 
but which do not have homologues with GBS. The genes corresponding to these 
ORFs were published in GenBank; the numbers for the GAS ORF refer direcdy to 
their GenBank entries. 
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Table 36 lists the 1034 Spn ORFs which share homologues with GBS and 
GAS. These ORFs were published in GenBank. The numbers for Spn correspond to 
the entry for AE005672. 

Table 37 lists the 195 Spn ORFs which share homologues with GBS but do 
not share homologues with GAS. These ORFs were published in GenBank. The 
numbers for Spn correspond to the entry for AE005672. 

Table 38 lists the 74 Spn ORFs which share homologues with GAS but do not 
share homologues with GBS. These ORFs were published in GenBank. The 
numbers for Spn correspond to the entry for AE005672. 

Figure 1 is a circular representation of the GBS genome and comparative 
hybridisations using microarrays. 

Figure 2 is a schematic representation of in silico comparisons between 
streptococci. 

Figure 3 depicts a phylogenetic tree of GBS strains based on PCR sequences. 
Figure 4 depicts a linear representation of the GBS genome. 



Figure 5 demonstrates phylogenetic profiling of GBS strains based on 
comparative genome hybridisations. 

BRIEF DESCRIPTION OF THE SEQUENCE ID NOS. 

The following SEQ ID NOS are used in the application and figures. 

SEQ ID NOS. 1301 - 1316 represent the polynucleotide sequences 
corresponding to the SAG0466 ORF (thiolase) in the GBS strains indicated for each 
sequence, including where indicated reverse complements. 

SEQ ID NOS. 1401 - 1417 represent the polynucleotide sequences 
corresponding to the SAG0471 ORF (glucokinase) in the GBS strains indicated for 
each sequence, including where indicated reverse complements. 

SEQ ID NOS. 1501 - 1511-represent the polynucleotide sequences 
corresponding to the SAG0492 ORF (amino acid ABC transporter, ATP-binding 
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protein) in the GBS strains indicated for each sequence, including where indicated 
reverse complements. 

SEQ ID NOS. 1601 - 1617 represent the polynucleotide sequences 
corresponding to the SAG0767 ORF (D-alanine - D-alanine ligase) in the GBS strains 
indicated for each sequence, including where indicated reverse complements. 

SEQ ID NOS. 1701 - 171 1 represent the polynucleotide sequences 
corresponding to the SAG1086 ORF (xanthine phosphoribosyltransferase) in the GBS 
strains indicated for each sequence, including where indicated reverse complements. 

SEQ ID NOS. 1801 - 1814 represent the polynucleotide sequences 
corresponding to the SAG1600 ORF (glutamate racemase) in the GBS strains 
indicated for each sequence, including where indicated reverse complements. 

SEQ ID NOS. 1901 - 1914 represent the polynucleotide sequences 
corresponding to the SAG1680 ORF (shikimate 5-dehydrogenase) in the GBS strains 
indicated for each sequence, including where indicated reverse complements. 

SEQ ID NOS. 2001 - 2010 represent the polynucleotide sequences 
corresponding to the SAG1723 ORF (signal peptidase I) in the GBS strains indicated 
for each sequence, including where indicated reverse complements. 

SEQ ID NOS. 2101 - 21 12 represent the polynucleotide sequences 
corresponding to the SAG0079 ORF (adenylate kinase) in the GBS strains indicated 
for each sequence, including where indicated reverse complements. 

SEQ ID NOS. 2201 - 221 1 represent the polynucleotide sequences 
corresponding to the SAG0093 ORF (D-alanyl-D-alanine carboxypeptidase family 
protein) in the GBS strains indicated for each sequence, including where indicated 
reverse complements. 

SEQ ID NOS. 2301 - 231 1 represent the polynucleotide sequences 
corresponding to the SAG0163 ORF (competence protein CglA) in the GBS strains 
indicated for each sequence, including where indicated reverse complements. 

SEQ ID NOS. 2401 - 2410 represent the polynucleotide sequences 
con-esponding to the SAG0290 ORF (ABC transporter, substrate-binding protein) in 
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the GBS strains indicated for each sequence, including where indicated reverse 
complements. 

SEQ ID NOS. 2501 - 251 1 represent the polynucleotide sequences 
corresponding to the SAG0368 ORF (protein of unknown function) in the GBS strains 
indicated for each sequence, including where indicated reverse complements. 

SEQ ID NOS. 2601 - 2609 represent the polynucleotide sequences 
corresponding to the SAG0503 ORF (lipase/acylhydrolase) in the GBS strains 
indicated for each sequence, including where indicated reverse complements. 

SEQ ID NOS. 2701 - 271 1 represent the polynucleotide sequences 
corresponding to the SAG1473 ORF (cell wall surface anchor family protein) in the 
GBS strains indicated for each sequence, including where indicated reverse 
complements. 

SEQ ID NOS. 2801 - 2811 represent the polynucleotide sequences 
corresponding to the SAG1552 ORF (conserved hypothetical protein) in the GBS 
strains indicated for each sequence, including where indicated reverse complements. 

SEQ ID NOS. 2901 - 291 1 represent the polynucleotide sequences 
corresponding to the SAG1641 ORF (YaeC family protein) in the GBS strains 
indicated for each sequence, including where indicated reverse complements. 

SEQ ID NOS. 3001 - 3010 represent the polynucleotide sequences 
corresponding to the SAG2147 ORF (protein of unknown function / lipoprotein, 
putative) in the GBS strains indicated for each sequence, including where indicated 
reverse complements. 

SEQ ID NOS. 3101 - 3111 represent the polynucleotide sequences 
corresponding to the SAG2148 ORF (LysM domain protein) in the GBS strains 
indicated for each sequence, including where indicated reverse complements. 

DETAILED DESCRIPTION OF THE INVENTION 

The invention relates to polynucleotides which are conserved or specific to 
one or more species of Streptococcus, Streptococcus species serotypes, and/or 
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serotype isolates. In particular, the invention relates to polynucleotides from 
Streptococcus which are conserved or specific to one or more of the species of S. 
pneumoniae ("pneumococcus" or "S. pn."), S. pyogenes ("group A streptococcus*' or 
"GAS"), and S. agalactiae ("group B streptococcus" or "GBS"). The invention 
further relates to polynucleotides which are conserved or specific to one or more 
Streptococcal species serotypes, such as GBS serotypes la, lb, EE, HE, IV, V, VI, VII, 
and VIII. The invention still further relates to polynucleotides which are conserved or 
specific to one or more clinical isolates of a Streptococcus species. 

In order to facilitate an understanding of the invention, selected terms used in 
the application will be discussed below. 

As used herein, the phrase " species of Streptococcus" generally refers to 
species of the Streptoccus family, including S.pneumoniae ("pneumococcus" or 
"S.pn.")* S.pyogenes ('group A streptococcus' or 'GAS') and S.agalactiae ('group B 
streptococcus* or 'GBS'). 

As used herein, the phrase " Streptococcus species serotypes " generally refers 
to subdivisions based on a distinguishing characteristic within a specific 
Streptococcus species. The distinguishing characteristic can be identified by any of a 
wide range of diagnostic tools. For instance, GBS is generally recognized as 
comprising at least nine subdividing serotypes based on the structure of their 
polysaccharide capsule. 

As used herein, the phrases " serotype isolates" or " clinical isolates" generally 
refer to specific isolated bacterial strains of a specific Streptococcal species and 
serotype. 

As used herein in reference to bacterial genomes, the phrases " conserved" or 
" shared" generally refer to genomic sequences which have homologies in the two or 
more genomes in the reference. Homologous sequences preferably have greater than 
50% identity (e.g., 60%, 70%, 80%, 90%, 95%, 99% or more). 

As used herein in reference to bacterial genomes, the phrases "specific to" or 
"not shared" generally refer to genomic sequences which do not have homologues in 
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the two or more genomes in the reference. Sequences which are not homologues 
preferably have less than 50% identity (e.g., 40%, 35%, 30%, 25%, 20%, 15%, or 
less). 

Identity between nucleotide sequences can be determined using software 
programs known in the art, for example those described in section 7.7.18 of Current 
Protocols in Molecular Biology (F.M. Ausubel et ai 9 eds., 1987) Supplement 30. A 
preferred alignment program is GCG Gap (Genetics Computer Group, Wisconsin, 
Suite Version 10.1), preferably using default parameters, which are as follows: open 
gap = 3; extend gap = 1. 

Sequences within a Subset of the invention include sequences which hybridize 
to the listed genes. Hybridization reactions can be performed under conditions of 
different "stringency". Conditions that increase stringency of a hybridization reaction 
of widely known and published in the art [e.g. page 7.52 of Sambrook et al (1989) 
Molecular Cloning: A Laboratory Manual. NY, Cold Spring Harbor Laboratory]. 
Examples of relevant conditions include (in order of increasing stringency): 
incubation temperatures of 25°C, 37°C, 50°C, 55°C and 68°C; buffer concentrations 
of 10 x SSC, 6 x SSC, 1 x SSC, 0.1 x SSC (where SSC is 0.15 M NaCl and 15 mM 
citrate buffer) and their equivalents using other buffer systems; formamide 
concentrations of 0%, 25%, 50%, and 75%; incubation times from 5 minutes to 24 
hours; 1, 2, or more washing steps; wash incubation times of 1, 2, or 15 minutes; and 
wash solutions of 6 x SSC, 1 x SSC, 0.1 x SSC, or de-ionized water. Hybridization 
techniques and their optimization are well known in the art [e.g. see Sambrook et aL\ 
RNA Methodologies (Farrell, 1998) (Academic Press; ISBN 0-12-249695-7); Current 
Protocols in Molecular Biology (P.M. Ausubel etal, eds., 1987) Supplement 30; 
Short protocols in molecular biology (4th edition, 1999) Ausubel et al eds. ISBN 0- 
471-32938-X; US patent 5,707,829 etc.]. 

Identity between polypeptide sequences can be determined using software 
programs known in the art, for example those described in section 7.7.18 of Current 
Protocols in Molecular Biology (P.M. Ausubel et al., eds., 1987) Supplement 30. A 
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preferred alignment is determined by the Smith-Waterman homology search 
algorithm [Smith & Waterman (1981) Adv. Appl Math. 2: 482-489.] using an affine 
gap search with a gap open penalty of 12 and a gap extension penalty of 2, BLOSUM 
matrix 62. 

Typically, 50% identity or more between two proteins may be considered to 
be an indication of functional equivalence. References to a percentage sequence 
identity between two amino acid sequences means that, when aligned, that percentage 
of amino acids are the same in comparing the two sequences. 

The terms " polypeptide" , "protein" and "amino acid sequence" as used herein 
generally refer to a polymer of amino acid residues and are not limited to a minimum 
length of the product Thus, peptides, oligopeptides, dimers, mulimers, and the like, 
are included within the definition. Both full-length proteins and fragments thereof are 
encompassed by the definition. Minimum fragments of polypeptides useful in the 
invention can be at least 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, or even 15 amino acids. 
Typically, polypeptides useful in this invention can have a maximum length suitable 
for the intended application. Generally, the maximum length is not critical and can 
easily be selected by one skilled in the art. 

Reference to polypeptides and the like also includes derivatives of the amino 
acid sequences of the invention. Such derivatives can include postexpression 
modifications of the polypeptide, for example, glycosylation, acetylation, 
phosphorylation, and the like. Amino acid derivatives can also include modifications 
to the native sequence, such as deletions, additions and substitutions (generally 
conservative in nature), so long as the protein maintains the desired activity. These 
modifications may be deliberate, as through site-directed mutagenesis, or may be 
accidental, such as through mutations of hosts which produce the proteins or errors 
due to PCR amplification. Furthermore, modifications may be made that have one or 
more of the following effects: reducing toxicity; facilitating cell processing {e.g., 
secretion, antigen presentation, etc.); and facilitating presentation to B-cells and/or T- 
cells. 
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"Fragment" or " Portion" as used herein refers to a polypeptide consisting of 
only a part of the intact full-length polypeptide sequence and structure as found in 
nature. For instance, a fragment can include a C-terminal deletion and/or an N- 
terminal deletion of a protein. 

A "recombinant" protein is a protein which has been prepared by recombinant 
DNA techniques as described herein. In general, the gene of interest is cloned and 
then expressed in transformed organisms, as described further below. The host 
organism expressed the foreign gene to produce the protein under expression 
conditions. 

The term " polynucleotide ", as known in the art, generally refers to a nucleic 
acid molecule. A "polynucleotide" can include both double- and single-stranded 
sequences and refers to, but is not limited to, cDNA from viral, prokaryotic or 
eukaryotic MRNA, genomic RNA and DNA sequences from viral (e.g. RNA and 
DNA viruses and retroviruses) or prokaryotic DNA, and especially synthetic DNA 
sequences. The term also captures sequences that include any of the known base 
analogs of DNA and RNA, and includes modifications such as deletions, additions 
and substitutions (generally conservative in nature), to the native sequence, so long as 
the nucleic acid molecule encodes a therapeutic or antigenic protein. These 
modifications may be deliberate, as through site-directed mutagenesis, or may be 
accidental, such as through mutations of hosts that produce the antigens. 
Modifications of polynucleotides may have any number of effects including, for 
example, facilitating expression of the polypeptide product in a host cell. 
The term "polynucleotide" further includes DNA, RNA, DNA/RNA hybrids, DNA 
and RNA analogues such as those containing modified backbones (with modifications 
in the sugar and/or phosphates e.g. phosphorothioates, phosphoramidites etc.), and 
also peptide nucleic acids (PNA) and any other polymer comprising purine and 
pyrimidine bases or other natural, chemically or biochemically modified, non-natural, 
or deri vatized nucleotide bases etc. Nucleic acid according to the invention can be 
prepared in many ways (e.g. by chemical synthesis, from genomic or cDNA libraries, 
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from the organism itself etc.) and can take various forms (e.g. single stranded, double 
stranded, vectors, probes etc.). 

A polynucleotide can encode a biologically active (e.g., immunogenic or 
therapeutic) protein or polypeptide. Depending on the nature of the polypeptide 
encoded by the polynucleotide, a polynucleotide can include as little as 10 
nucleotides, e.g., where the polynucleotide encodes an antigen. 

By "isolated" is meant, when referring to a polynucleotide or a polypeptide, 
that the indicated molecule is separate and discrete from the whole organism with 
which the molecule is found in nature or, when the polynucleotide or polypeptide is 
not found in nature, is sufficiently free of other biological macromolecules so that the 
polynucleotide or polypeptide can be used for its intended purpose. 

"Antibody" as known in the art includes one or more biological moieties that, 
through chemical or physical means, can bind to or associate with an epitope of a 
polypeptide of interest The antibodies of the invention specifically bind to infectious 
prion conformations. The term "antibody" includes antibodies obtained from both 
polyclonal and monoclonal preparations, as well as the following: hybrid (chimeric) 
antibody molecules (see, for example, Winter et al. (1991) Nature 349 : 293-299; and 
U.S. Patent No. 4,816,567; F(ab')2 and F(ab) fragments; F v molecules (non-covalent 
heterodimers, see, for example, Inbar et al. (1972) Proc Natl Acad Sci USA 69:2659- 
2662; and Ehrlich et al. (1980) Biochem 12:4091-4096); single-chain Fv molecules 
(sFv) (see, for example, Huston et al. (1988) Proc Natl Acad Sci USA 85:5897-5883); 
dimeric and trimeric antibody fragment constructs; minibodies (see, e.g., Pack et al. 
(1992) Biochem 31: 1579-1584; Cumber et al. (1992) J Immunology 149B : 120-126); 
humanized antibody molecules (see, for example, Riechmann et al. (1988) Nature 
332:323-327; Verhoeyan et al. (1988) Science 239:1534-1536; and U.K. Patent 
Publication No. GB 2,276,169, published 21 September 1994); and, any functional 
fragments obtained from such molecules, wherein such fragments retain 
immunological binding properties of the parent antibody molecule. The term 
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"antibody" further includes antibodies obtained through non-conventional processes, 
such as phage display. 

As used herein, the term "monoclonal antibody" refers to an antibody 
composition having a homogeneous antibody population. The term is not limited 
regarding the species or source of the antibody, nor is it intended to be limited by the 
manner in which it is made. Thus, the term encompasses antibodies obtained from 
murine hybridomas, as well as human monoclonal antibodies obtained using human 
rather than murine hybridomas. See, e.g., Cote, et al. Monoclonal Antibodies and 
Cancer Therapy, Alan R. Liss, 1985, p 77. 

An " immunogenic composition" as used herein refers to a composition that 
comprises an antigenic molecule where administration of the composition to a subject 
results in the development in the subject of a humoral and/or a cellular immune 
response to the antigenic molecule of interest. The immunogenicity of the 
composition may be facilitated by the use of an adjuvant The immunogenic 
composition can be introduced directly into a recipient subject, such as by injection, 
inhalation, oral, intranasal or any other parenteral or mucosal (e.g., intra-rec tally or 
intra- vaginally) route of administration. 

The practice of the present invention will employ, unless otherwise indicated, 
conventional methods of chemistry, biochemistry, molecular biology, immunology 
and pharmacology, within the skill of the art Such techniques are explained fully in 
the literature. See, e.g., Remington's Pharmaceutical Sciences, 18th Edition (Easton, 
Pennsylvania: Mack Publishing Company, 1990); Methods In Enzymology (S. 
Colowick and N. Kaplan, eds., Academic Press, Inc.); and Handbook of Experimental 
Immunology, Vols. I-IV (D.M. Weir and C.C. Blackwell, eds., 1986, Blackwell 
Scientific Publications); Sambrook, et al., Molecular Cloning: A Laboratory Manual 
(2nd Edition, 1989); Handbook of Surface and Colloidal Chemistry (Birdi, K.S. ed., 
CRC Press, 1997); Short Protocols in Molecular Biology, 4th ed. (Ausubel et al. eds., 
1999, John Wiley & Sons); Molecular Biology Techniques: An Intensive Laboratory 
Course, (Ream et al., eds., 1998, Academic Press); PCR (Introduction to 
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Biotechniques Series), 2nd ed (Newton & Graham eds., 1997, Springer Verlag); 
Peters and Dalrymple, Fields Virology (2d ed), Fields et al. (eds.), B.N. Raven Press, 



It is understood that the antibodies and methods of this invention are not 
limited to particular formulations or process parameters as such may, of course, vary. 
It is also to be understood that the terminology used herein is for the purpose of 
describing particular embodiments of the invention only, and is not intended to be 
limiting. 

All publications, patents and patent applications cited herein are hereby 
incorporated by reference in their entirety. 

Vaccines and Immunisation 

The invention provides an immunogenic composition comprising a 
polypeptide, or a fragment or derivative thereof, which is encoded by a polynucleotide 
sequence which is conserved across one or more species of Streptococcus. 

Hie polynucleotide is preferably conserved across one or more species of 
Streptococcus selected from the group consisting of GBS, GAS and pneumococcus. 
In one embodiment, the polynucleotide is a GBS polynucleotide which is homologous 
with at least one gene from both GAS and pneumococcus. Preferably, the GBS 
polynucleotide is selected from GBS Subset 1, which includes 1060 GBS genes which 
have homologies with both GAS and pneumococcus (Table 8). 

In another embodiment, the polynucleotide is a GAS polynucleotide which is 

i 

homologous with at least one gene from both GBS and pneumococcus. Preferably, 
the GAS polynucleotide is selected from GAS Subset 1, which includes 1006 GAS 
genes which have homologues with both GBS and pneumococcus. 

In another embodiment, the polypeptide is a pneumococcal polynucleotide 
which is homologous with at least one gene both GAS and GBS. Preferably, the 
pneumococcus polynucleotide is selected from Spn Subset 1, which includes 1034 
pneumococcal genes which have homologous with both GBS and GAS. 
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In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous with at least one gene from GAS. Preferably, the polynucleotide is 
selected from one of the genes listed GBS Subset 2, which includes 225 GBS genes 
which have homologues with GAS, but not with pneumococcus. 

In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous with at least one gene from pneumococcus. Preferably, the 
polynucleotide is selected from GBS Subset 3, which includes 176 GBS genes which 
have homologues with pneumococcus. 

In another embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous with at least one gene from GBS. Preferably, the polynucleotide is 
selected from GAS Subset 2, which includes 212 GAS genes which have a 
homologue with GBS. 

In another embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous with at least one gene from pneumoccus. Preferably, the polynucleotide 
is selected from GAS Subset 3, which includes 62 GAS genes which have a 
homologue with pneumococcus. 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide 
which is homologous with at least one gene from GBS. Preferably, the 
polynucleotide is selected from Spn Subset 2, which includes 195 Spn genes which 
have a homologue with GBS. 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide 
which is homologous with at least one gene from GAS. Preferably, the 
polynucleotide is selected from Spn Subset 3, which includes 74 Spn genes which 
have a homologue with GAS. 

The invention further provides an immunogenic composition comprising a 
polypeptide, or a fragment or derivative thereof, which is encoded by a polynucleotide 
sequence which is specific to one or more species of Streptococcus. 

The invention further provides an immunogenic composition comprising a 
polypeptide, or a fragment or derivative thereof, which is encoded by a polynucleotide 
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which is specific to GBS, GAS and pneumococcus. In one embodiment, the 
polynucleotide is a GBS polynucleotide which is homologous to at least one gene 
from both GAS and pneumococcus. Preferably, the GBS polynucleotide is selected 
from GBS Subset 1. In an alternative embodiment, the polynucleotide is a GBS 
polynucleotide which is homologous to at least one gene from both GAS and 
pneumococcus, but which is not homologous to a gene in any other published 
bacterial genome at the time of the invention. Preferably, the GBS polynucleotide is 
selected from one of the 12 GBS genes included in GBS Subset 1(a). (Table 3). 

In another embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous to at least one gene in both GBS and pneumococcus. Preferably, the 
GAS polynucleotide is selected from GAS Subset 1. In another embodiment, the 
polynucleotide is a GAS polynucleotide which is homologous to at least one gene in 
both GBS and pneumococcus but which is not homologous to any gene in any other 
published bacterial genome at the time of the invention. Preferably, the GAS 
polynucleotide is selected from GAS Subset 1(a). 

Alternatively, the polynucleotide is a pneumoccus polynucleotide which is 
homologous to at least one gene in both GBS and GAS. Preferably, the 
pneumococcus polynucleotide is selected from Spn Subset 1(a). In another 
embodiment, the polynucleotide is a pneumoccus polynucleotide which is 
homologous to at least one gene in both GBS and GAS but which does not have a 
homologue in any other published bacterial genome at the time of the invention. 
Preferably, the pneumococcus polynucleotide is selected from Spn Subset 1(a). 

The invention further provides an immunogenic composition comprising a 
polypeptide, or a fragment or derivative thereof, which is encoded by a polynucleotide 
sequence which is specific to GBS. In one embodiment, the polynucleotide is a GBS 
polynucleotide which is not homologue to a gene in either GAS or pneumococcus. 
Preferably, the GBS polynucleotide is selected from one of the 683 GBS genes 
included in GBS Subset 4. In a further embodiment, the polynucleotide is a GBS 
polynucleotide which is not homologous to a gene in either GAS or pneumococcus or 
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any other published bacterial genome at the time of the invention. Preferably, the 
GBS polynucleotide is selected from one of the 315 GBS genes in GBS Subset 4(a). 

The invention further provides an immunogenic composition comprising a 
polypeptide, or a fragment or derivative thereof, which is encoded by a polynucleotide 
sequence which is specific to GAS. In onp embodiment, the polynucleotide is a GAS 
polynucleotide which is not homologous to a gene in either GBS or pneumococcus. 
Preferably, the GBS polynucleotide is selected from one of the 416 GAS genes 
included in GAS Subset 4. In a further embodiment, the polynucleotide is a GAS 
polynucleotide which does not have a homologue in either GBS or pneumococcus or 
in any other published bacterial genome at the time of the invention. Preferably, the 
GAS polynucleotide is selected from GAS Subset 4(a). 

The invention further provides an immunogenic composition comprising a 
polypeptide, or a fragment or derivative thereof, which is encoded by a polynucleotide 
sequence which is specific to pneumococcus. Li one embodiment, the polynucleotide 
is a pneumococcus polynucleotide which is not homologous to a gene in either GBS 
or GAS. Preferably, the pneumococcus polynucleotide is selected from one of the 
836 Spn genes included in Spn Subset 4. In a further embodiment, the polynucleotide 
is a pneumococcus polynucleotide which does not have a homologue in either GBS or 
GAS or in any other published bacterial genome at the time of the invention. 
Preferably, the pneumococcus polynucleotide is selected from Spn Subset 4(a). 

The invention further provides an immunogenic composition comprising a 
polypeptide, or a fragment or derivative thereof, which is encoded by a polynucleotide 
sequence which is specific to GBS and GAS. In one embodiment, the polynucleotide 
is a GBS polynucleotide which is homologous to at least one gene from GAS but is 
not homologous to a gene from pneumococcus. Preferably, the GBS polynucleotide 
is selected from one of the 225 GBS genes included in GBS Subset 2. In another 
embodiment, the GBS polynucleotide is homologous to at least one gene from GAS 
but is not homologous to any gene from pneumococcus and does not have a 
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homologue in any other published bacterial genome at the time of the invention. 
Preferably, the GBS polynucleotide is selected from GBS Subset 2(a). 

In another embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous to at least one gene from GBS but is not homologous to any gene from 
pneumococcus. Preferably, the GAS polynucleotide is selected from one of the 212 
GAS genes included in GAS Subset 2. In another embodiment, the GAS 
polynucleotide is homologous to at least one gene from GBS but is not homologous to 
any gene from pneumococcus and does not have a homologous gene with any other 
published bacterial genome at the time of the invention. Preferably, the GAS 
polynucleotide is a selected from GAS Subset 2(a). 

The invention further provides an immunogenic composition comprising a 
polypeptide, or a fragment or derivative thereof, which is encoded by a polynucleotide 
sequence which is specific to GBS and pneumococcus. In one embodiment, the 
polynucleotide is a GBS polynucleotide which is homologous to at least one gene 
from pneumococcus but is not homologous to any gene from GAS. Preferably, the 
GBS polynucleotide is selected from one of the 176 GBS genes included in GBS 
Subset 3. In another embodiment, the polynucleotide is a GBS polynucleotide which 
is homologous with at least one gene from pneumococcus but is not homologous with 
any GAS polynucleotide and does not have a homologous gene in any of the other 
published bacterial genomes at the time of the invention. Preferably, the GBS 
polynucleotide is selected from GBS Subset 3(a). 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide 
which is homologous with at least one gene from GBS, but is not homologous with 
any gene from GAS. Preferably, the pneumoccous polynucleotide is selected from 
one of the 195 Spn genes included in Spn Subset 2. In another embodiment, the 
polynucleotide is a pneumococcus polynucleotide which is homologous with at least 
one gene from GBS, but is not homologous with any gene from GAS and does not 
have a homologous gene in any other published bacterial genome at the time of the 
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invention. Preferably, the pneumococcus polynucleotide is selected from Spn Subset 



The invention further provides an immunogenic composition comprising a 
polypeptide, or a fragment or derivative thereof which is encoded by a polynucleotide 
sequence which is specific to GAS and pneumococcus. In one embodiment, the 
polynucleotide is a GAS polynucleotide which is homologous with at least one gene 
from pneumococcus but is not homologous with any gene from GBS. Preferably, the 
GAS polynucleotide is selected from one of the 62 GAS genes included in GAS 
Subset 3. In another embodiment, the polynucleotide is a GAS polynucleotide which 
is homologous with at least one gene from pneumococcus but is not homologous with 
any gene from GBS and is not homologous with any gene of any published bacterial 
genome at the time of the invention. Preferably, the GAS polynucleotide is selected 
from GAS Subset 3(a). 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide 
which is homologous with at least one GAS polynucleotide, but is not homologous 
with any GBS gene. Preferably, the pneumoccous polynucleotide is selected from 
one of the 74 Spn genes included in Spn Subset 3. In another embodiment, the 
polynucleotide is a pneumococcus polynucleotide which is homologous with at least 
one gene from GAS, but is not homologous with any gene from GBS or with a gene 
from any other published bacterial genome at the time of the invention. Preferably, 
the pneumococcus polynucleotide is selected from Spn Subset 3(a). 

The invention further provides an immunogenic composition comprising a 
polypeptide, or a fragment or derivative thereof, which is encoded by a polynucleotide 
sequence which is specific to one or more Streptococcal species serotypes. 
Preferably, the polynucleotide is specific to a Streptococcal species serotype selected 
from the Streptococcal species GBS, GAS and pneumococcus. More preferably, the 
polynucleotide is specific to one or more GBS serotypes selected from the group 
consisting of GBS serotype la, lb, n, HI, IV, V, VI, VH and VBQL 



3(a). 
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The invention further provides an immunogenic composition comprising a 
polypeptide, or a fragment or derivative thereof, which is encoded by a polynucleotide 



Preferably, the polynucleotide is specific to a Streptococcal species serotype selected 
from the Streptococcal species GBS, GAS and pneumococcus. More preferable, the 
polynucleotide is conserved across one or more GBS serotypes selected from the 
group consisting of GBS serotype la, lb, II, m, IV, V, VI, VH and Vffl. 

The invention further provides an immunogenic composition comprising a 
polypeptide, or a fragment or derivative thereof, which is encoded by a polynucleotide 
sequence which is specific to one or more clinical isolates of a Streptococcal species. 
Preferably, the polynucleotide is specific to a Streptococcal species clinical isolate 
selected from the Streptococcal species GBS, GAS and pneumococcus. More 
preferably, the polynucleotide is specific to one or more GBS clinical isolates selected 
from the clinical isolates identified in Table 5. Still more preferably, the 
polynucleotide is specific to one or more GBS clinical isolates having one or more 
genes selected from the genes listed in Table 7. 

In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous to at least one gene from both GAS and pneumococcus and which varies 
among clinical isolates. In another embodiment, the polynucleotide is a GBS 
polynucleotide which is homologous to at least one gene from both GAS and 
pneumococcus and which is homologous with at least one gene from at least one of 
the clinical isolates identified in Table 5. In another embodiment, the polynucleotide 
is a GBS polynucleotide which is homologous to at least one gene from both GAS 
and pneumococcus and which is homologous with at least one gene from each of the 
clinical isolates identified in Table 5. Preferably, the polynucleotide is selected from 
one of the genes listed in Table 7. 

In one embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous to at least one gene from GAS and is not homologous to any gene from 
pneumococcus and which varies among clinical isolates. In another embodiment, the 



sequence which is conserved across one or more Streptococcal species serotypes. 
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polynucleotide is a GBS polynucleotide which is homologous to at least one gene 
from GAS and is not homologous to any gene from pneumococcus and which is 
homologous to at least one gene from at least one of the clinical isolates identified in 
Table 5. In another embodiment, the polynucleotide is a GBS polynucleotide which 
is homologous to at least one gene from GAS and is not homologous to any gene from 
pneumococcus and which is homologous to at least one gene from each of the clinical 
isolates identified in Table 5. Preferably, the polynucleotide is selected from one of 
the genes listed in Table 7. 

In one embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous to at least one gene from pneumococcus and is not homologous to any 
gene from GAS and which varies among clinical isolates. In another embodiment, the 
polynucleotide is a GBS polynucleotide which is homologous to at least one gene 
from pneumococcus and is not homologous to any gene from GAS and which is 
homologous to at least one gene from at least one of the clinical isolates identified in 
Table 5. In another embodiment, the polynucleotide is a GBS polynucleotide which 
is homologous to at least one gene from pneumococcus and is not homologous to any 
gene from GAS and which is homologous to at least one gene from each of the 
clinical isolates identified in Table 5. Preferably, the polynucleotide is selected from 
one of the genes listed in Table 7. 

In one embodiment, the polynucleotide is a GBS polynucleotide which is not 
homologous to any gene from GAS or pneumococcus and which varies among 
clinical isolates. In another embodiment, the polynucleotide is a GBS polynucleotide 
which is not homologous to any gene from GAS or pneumococcus and which is 
homologous to at least one gene from at least one of the clinical isolates identified in 
Table 5. In another embodiment, the polynucleotide is a GBS polynucleotide which 
is not homologous to any gene from GAS or pneumococcus and which is homologous 
to at least one gene from each of the clinical isolates identified in Table 5. Preferably, 
the polynucleotide is selected from one of the genes listed in Table 7. 
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The invention further provides an immunogenic composition comprising a 
polypeptide, or a fragment or derivative thereof, which is encoded by a polynucleotide 
sequence which is conserved across one or more clinical isolates of a Streptococcal 
species. Preferably, the polynucleotide is conserved across one or more Streptococcal 
clinical isolates selected from the Streptococcal species GBS, GAS and 
pneumococcus. More preferable, the polynucleotide is conserved across one or more 
GBS clinical isolates identified in Table 5. Still more preferably, the polynucleotide 
is conserved across one or more clinical isolates having one or more genes selected 
from the genes listed in Table 7. 

The invention further provides for an immunogenic composition comprising a 
polypeptide encoded by a polynucleotide selected from one or more of the Subsets of 
the invention. 

The invention provides a method for raising an immune response in a patient 
by administering any one of the immunogenic compositions set forth above. The 
choice of immunogenic composition means that the immune response may be reactive 
against all three of GAS, GBS and streptococcus, may be reactive against only two of 
the three, or may be reactive only against GBS. 

The immune response is preferably an antibody response. It may be a 
protective immune response. The patient is preferably a human. 

Essential genes and knockouts 

The invention provides a Streptococcus bacterium wherein one or more genes 
within any of the Subsets of this invention have been knocked out The choice of 
Subset means that the knocked out gene may be, for instance, a gene found in GBS 
but not in GAS or pneumococcus {e.g. which is involved in the pathogenesis of GBS, 
but not in the pathogenesis of GAS or pneumococcus, such as binding GBS cellular 
targets). 

Techniques for producing knockout bacteria are well known, and knockout 
Streptococci of various species have been reported [e.g. Margolis et al. (2001) 
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Antimicrob. Agents Chemother. 45:2432-2435; Zhang etal (2000) Cell 102:827-837; 
Nizet et al. (2000) Infect. Immun. 68:4245-4254; Nizet et al. (1997) Adv. Exp. Med. 
Biol 418:627-630; etc.]. 

The knockout mutation may be situated in the coding region of the gene or 
may lie within its transcriptional control regions (e.g. within its promoter). 

The knockout mutation will reduce the level of mRNA encoding the 
corresponding polypeptide to <l% of that produced by the wild-type bacterium, 
preferably <0.5%, more preferably <0.1%, and most preferably to 0%. 

The knockout mutants of the invention may be used as immunogenic 
compositions (e.g. as vaccines) to prevent streptococcal infection. Such a vaccine 
may include the mutant as a live attenuated bacterium. 

The knockout mutants of the invention may be used to determine whether 
genes are essential for bacterial survival, either under normal or stress conditions. 



The invention provides a single-stranded nucleic acid comprising a fragment 
of xj or more nucleotides from a nucleotide sequence selected from one of the Subsets 
of the invention. The choice of group means that the nucleic acid may be 
complementary to a gene sequence found in GBS, GAS and pneumococcus, or a gene 
sequence specific to GBS. 

The single-stranded nucleic acid is at least xj nucleotides long. The value of xj 
is at least 7 (e.g. 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 
50 etc.). The single-stranded nucleic acid may be at most X2 nucleotides long, 
wherein x 2 is 100 or less (e.g. 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 
84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 
61,60). 

The nucleic acid is preferably of the formula S^NV-PCMNV-S', wherein 
0^0^15, 0>6>15, N is any nucleotide, and X is the fragment as defined above. The 



Antisense 
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values of a and b may independently be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14 or 
15. Each individual nucleotide N in the -(N)<r- and -(N)*- portions of the nucleic acid 
may be the same or different. The length of the nucleic acid (i.e. a+fc+jc/) is 
preferably x% or less. 

Antisense inhibition of streptococcal gene expression is known e.g. Sato etal 
(1998) FEMS Microbiol Lett 159:241-245. Antibacterial antisense techniques are 
also disclosed in international patent applications WO99/02673 and W099/13893. 

The single-stranded nucleic acid may reduce the level of polypeptide 
expression from the complementary gene to <1 % of that produced by the wild-type 
bacterium, preferably <0.5%, more preferably <0.1%, and most preferably to 0%. 

Antisense experiments may be used to determine whether genes are essential 
for bacterial survival, either under normal or stress conditions. 

Screening methods 

The invention provides a method for screening compounds, wherein the 
method involves contacting the compounds with a polypeptide expressed by one or 
more of the polynucleotides selected from one of the Subsets of the invention. The 
method may be for screening for agonists of the polypeptides, antagonists, antibiotics 
etc. Hie choice of group means, for instance, that the method may be used for 
identifying an antibiotic with broad anti-streptococcal activity could be identified, or 
for identifying an antibiotic specific to GBS. 

Potential compounds for screening include small organic molecules, peptides, 
peptoids, polypeptides, lipids, metals, nucleotides, nucleosides, aptamers, polyamines, 
antibodies, and derivatives thereof. Small organic molecules have a molecular weight 
between 50 and about 2,500 daltons, and most preferably in the range 200-800 
daltons. Complex mixtures of substances, such as extracts containing natural 
products, compound libraries or the products of mixed combinatorial syntheses also 
contain potential antagonists. 
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Typically, a polypeptide is incubated with a test compound, and the mixture is 
then tested to see if the polypeptide and test compound interact, or to see if the 
polypeptide's activity is inhibited. 

For preferred high-throughput screening methods, all the biochemical steps for 
this assay are performed in a single solution in, for instance, a test tube or microtitre 
plate, and the test compounds are analysed initially at a single compound 
concentration. For the purposes of high throughput screening, the experimental 
conditions are adjusted to achieve a proportion of test compounds identified as 
"positive" compounds from amongst the total compounds screened. 

The invention also provides a compound identified using these methods. 
These can be used to treat or prevent streptococcal infection. The compound 
preferably has an affinity for the adhesion-specific protein of at least 10" 7 M e.g. 10" 8 
M, 1<T 9 M, 10" 10 M or tighter. 

Distinguishing Streptococcal species 

The invention provides a method for determining whether a Streptococcus 
bacterium of interest is or is not in the species agalactiae, pyogenes or pneumoiae, 
comprising the step(s) of: (a) contacting the bacterium with a nucleic acid probe 
comprising the sequence of a gene selected from one of the Subsets of the invention; 
and/or (b) contacting the bacterium with an antibody which binds to a polypeptide 
encoded by one or more of the polynucleotides of one or more of the Subsets of the 
invention. The choice of group means, for instance, that the method may be used for 
distinguishing GBS from GAS and from pneumococcus, or for confirming that a 
bacterium is not a GAS or pneumococcus. 

The method will typically include the further step of detecting the presence or 
absence of an interaction between the bacterium of interest and the nucleic acid or 
protein. 
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The bacterium of interest may be in a cell culture, for example, or may be 
within a biological sample believed or known to contain a streptococcus. It may be 
intact or may be, for instance, lysed. 

The term "biological sample" encompasses a variety of sample types obtained 
from an organism and can be used in a diagnostic or monitoring assay. The term 
encompasses blood and other liquid samples of biological origin, solid tissue samples, 
such as a biopsy specimen or tissue cultures or cells derived therefrom and the 
progeny thereof. The term encompasses samples that have been manipulated in any 
way after their procurement, such as by treatment with reagents, solubilization, or 
enrichment for certain components. The term encompasses a clinical sample, and also 
includes cells in cell culture, cell supernatants, cell lysates, serum, plasma, biological 
fluids, and tissue samples. 

GBS 2603 Type V Genomic Sequence 

Applicants have sequenced the complete genome sequence of GBS clinical 
type V isolate 2603 V/R and performed comparative analyses comparing this 
sequence with other GBS strains, with other species of pathogenic Streptococci and 
with other known bacterial species. The entire genomic sequence is available as of 
the filing date of this application at http://www.tigr.org . This genomic sequence is 
incorporated herein by reference in its entirety. Hie genomic sequence of GBS type 
V isolate 2603 V/R is also set forth in International Patent Application WO 02/34771. 

In one embodiment, the invention relates to the polynucleotides, and 
fragments and derivatives thereof, set forth in the GBS clinical type V isolate 2603 
which are not disclosed within WO 02/34771. The invention further relates to 
polypeptides expressed by the polynucleotides of the invention. 

Applicants have predicted that the GBS 2603 isolate contains approximately 
2,176 predicted genes. Each predicted gene is set forth in Table 1, listed by a 
SAGxxxx ORF number. Table 1 also includes the predicted amino acid size of the 
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predicted expressed protein and the predicted function, if known. The sequence of 
each SAG reference can be obtained at the TIGR website. 

Figure 1 is a circular representation of the GBS genome and comparative 
hybridisations using microarrays. The outer circle represents predicted coding 
regions on the plus strand color coded by role categories: violet indicating amino acid 
biosynthesis; light blue indicating biosynthesis of cofactors, prosthetic groups, and 
carriers; light green indicating cell envelope; red indicating cellular processes; brown 
indicating central intermediary metabolism; yellow indicating DNA metabolism; light 
gray indicating energy metabolism; magenta indicating fatty acid and phospholipid 
metabolism; pink indicating protein synthesis and fate; orange indicating purines, 
pyrimidines, nucleosides, and nucleotides; olive indicating regulatory functions and 
signal transduction; dark green indicating transcription; teal indicating transport and 
binding proteins; gray indicating unknown function; salmon indicating other 
categories; blue indicating hypothetical proteins. 

The second circle represents predicted coding regions on the minus strand. In 
the third circle, black represents atypical nucleotide composition curve; green 
represents most atypical regions; magenta represents insertion elements; red diamonds 
indicate rRNAs. 

Circles 4 — 22 represent comparative hybridisations of strain 2603 V/R with 19 
GBS strains. Cy3/Cy5 (2603 V/R signal/test strain) ratio cutoffs were defined 
arbitrarily as Cy3/Cy5 - 1.0 - 3.0, the gene was present in the test strain, no color was 
added; Cy3/Cy5 = 3.0 - 10.0, ambiguous result (blue); Cy3/Cy5 > 10, gene absent in 
test strain (red). 

Circles 4-9 represent type la strains 090, 515, A909, Davis, and DK8. 
Circles 10-11 represent type lb strains S7 7357b and H36B. Circles 12 - 13 
represent type II strains 18RS21 and DK21. Circles 14 - 18 represent type III COH1, 
COH31, D136C, M732 and M781. Circle 19 represents type V strain CJB11 1. 
Circles 20 - 21 represent type Vm strains SMU014 and JM9130013. Circle 22 
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represents nontypable (NT) strain CJB1 10. Throughout Figure 1, varying regions of 
five or more consecutive genes are indicated by yellow bullets. 

Figure 4 depicts a linear representation of the GBS genome. The location of 
predicted coding regions color-coded by biological role (see Figure 1) is displayed. 
Arrowed boxes represent the direction of transcription for each ORF. The number of 
membrane-spanning domains predicted by TopPred is displayed as lipid bi-layers on 
top of ORFs, only for those whose products have five or more predicted membrane 
spanning regions. Genes coding for rRNAs (16S, 23S, 5S) and tRNAs (clover leaf 
structure with number of genes) are indicated Predicted Rho-independent 
transcriptional terminators are represented by hairpins. 

ORF's were predicted by GLIMMER (See, Delcher, et ah, (1999) Nucleic 
Acids Res. 27, 4636 - 4641 and Salzberg, et ah, (1998) Nucleic Acids Res. 26, 544- 
548) trained with ORFs larger than 600 base pairs from the genomic sequence and 
GBS genes available in GenBank. All predicted proteins larger than 30 amino acids 
were searched against a nonredundant protein database. (See Fleischmann, et al., 
(1995) Science 269, 496 - 512). Frame-shifts and point mutations were detected and 
corrected where appropriate; those remaining were annotated as "authentic frame- 
shift" or "authentic point mutation". Protein membrane-spanning domains were 
identified by TOPPRED (See Claros, et al., (1994) Comput Appl BioscL 10, 685 - 
686). Candidate lipoprotein signal peptides (See Hayashi et al., (1990) J. Bioenerg. 
Biomembr. 22, 451 - 471) were flagged by N-terminal exact matches to the pattern 
{DERK} (6M1JVMFWSTAG] (2)-[LTVMFYSTAGCQ] » [AGS] - C. Putative 
signal peptides were identified by using SIGNALP (Nielsen, et al., (1997) Protein 
Eng. 10, 1 - 6). Two sets of hidden Markov models were used to determine ORF 
membership in families and superfamilies: PFAM Ver. 5.5 (Bateman, et al., (2000) 
Nucleic Acids Res. 28, 263 - 266) and HGRFAMS 1 .0 (Haft et al., (2001) Nucleic 
Acids Res. 29, 41 - 43). Domain-based paralogous families were built by performing 
all-versus-all searches on the protein sequences by using a modified version of a 
previously described method. (Niermann, et al., (2001) Proc. Natl. Acad. ScL USA 
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98, 4136 - 4141) Potential lineage-specific gene duplications were estimated by 
identification of OFRs more similar to ORFs within the GBS genome than to ORFs 
from other complete genomes. All ORFs were searched with FASTA3 (Pearson 
(2000) Methods Mol Biol. 132, 185 - 219) against all ORF's from the complete 
genomes and matches with a FASTA P value of 10" 15 were considered significant 

The genome consists of a circular chromosome of 2,160,266 base pairs with a 
G+C content of 35.7%. Base pair one of the chromosome was assigned within the 
putative origin of replication. The genome contains 80 tRNAs, 7rRNAs, and 3 
sRNAs. Approximately 78% of the 2,176 predicted genes are transcribed in the same 
direction as that of DNA replication, a feature also observed in S. pn. and other low- 
GC Gram positive organisms. 

Biological roles were assigned to 1,409 (65%) of the genome according to a 
classification scheme adapted from Riley (1993) Microbiol Rev. 57, 862 - 952. 
Another 527 predicted proteins (24%) matched proteins of unknown function, and the 
remaining 240 (1 1 %) had no database match. The expression of 50 of these 
hypothetical proteins was confirmed by Western Blot analysis, and the proteins were 
annotated as "proteins of unknown function " A total of 339 paralogous piotein 
families were identified in strain 2603, containing 941 predicted proteins (43% of the 
total). 

Hie Western Blot analysis was conducted as follows. GBS strain 2603 V/R 
cells were grown in Todd-Hewitt broth (Difco) to OD600nm = 0.5. The culture was 
centrifuged for 20 minutes at 5,000 ipm. Hie supernatant was discarded, and bacteria 
were washed once with PBS, resuspended in 2 ml of 50 mM Tris-HCl pH 6.8, 
containing 400 units of Mutanolysin (Sigma), and incubated 2 hours at 37°C. After 
three cycles of freeze and thaw, cellular debris was removed by centrifugation at 
14,000 rpm for 10 minutes, and the protein concentration of the supernatant was 
measured by the Bio-Rad Protein assay, with BSA as a standard. Purified 
recombinant proteins (50 ng) and total cell extracts (25 derived from GBS 
serotype V 2603 V/R strain were separated by SDS/PADE and electroblotted onto 
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nitrocellulose membranes for 1 hour at 100 V. The membranes were saturated by 
overnight incubation at 4° C in 5% skimmed milk and 0.1% Tween 20 in PBS and 
incubated for 1 hour at room temperature with sera from immunized mice diluted 
1:500 - 1:1,000 in saturation buffer. To reduce background due to antibodies raised 
against contaminating R coli proteins, sera were preincubated with R coli protein 
extracts absorbed on nitrocellulose strips. The membranes were washed twice in 3% 
skimmed milk and 0.1% Tween 20 in PBS and incubated for 1 hour with a 1:1,000 
dilution of horseradish peroxidase-conjugated antimouse Ig (DAKO). After washing 
with 0.1% Tween 20 in PBS, the membranes were developed with the Opti-4CN 
Substrate Kit (Bio-Rad). 

Table 2 comprises a list of predicted and experimentally characterized surface 
and secreted proteins from GBS. Candidate signal peptides and lipoprotein motifs 
were predicted with PSORT [Nakai, K. & Horton, P. (1999) Trends Biochem Sci 24, 
34-6] and other methods (see methods), sortase motifs (LPxTG) were detected using 
the FINDPATTERNS program of the GCG Package [Devereux, J., Haeberli, P. & 
Smithies, O. (1984) Nucleic Acids Res 12, 387-95] and hidden Markov models. 
Column "Other" indicates proteins canying other motifs {e.g. integrin-binding motif 
RGD) or are similar to characterized surface-exposed proteins. Western blot results 
were considered positive when the antibodies revealed a predominant band of the 
expected molecular weight on the total protein extracts of S. agalactiae strain 2603 
V/R, ORFs without + or - in this column were not tested in western blot FACS 
analyses were performed for western blot positive proteins only. Western blot and 
FACS data are displayed only for proteins carrying at least one of the other motifs 
shown in the table. Column "GBS specific" indicates genes unique to S. agalactiae 
(when compared to other completely sequenced genomes) that are present in all the 
S. agalactiae strains tested in comparative genome hybridization analyses. Finally, 
only proteins carrying less than 3 predicted transmembrane domains are shown in the 
table, other proteins are likely to be embedded in the cytoplasmic membrane and are 
probably not exposed on the organism's surface. 
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FACS data was collected as follows: GBS 2603 V/R strain cells were grown 
in Todd-Hewitt broth (Difco) to OD600nra = 0.5. The culture was centrifuged for 20 
minutes at 5,000 rpm, and bacteria were washed once with PBS, resuspended in PBS 
containing 0.05% paraformaldehyde, and incubated for 1 hour at 37°C and then 
overnight at 4°C. Fifty microliters of fixed bacteria (OD600nm 0.1) was washed once 
with PBS, resuspended in 20 pd of newborn calf serum (Sigma), and incubated for 1 
hour at 4°C in lOO^il of preimmune or immune sera and diluted 1:200 in dilution 
buffer (PBS, 20% newborn calf serum, 0.1% BSA). After centrifugation and washing 
with 200^1 of washing buffer (0.1% BSA in PBS), samples were incubated for 1 hour 
at 4°C with 50 ul of R-phycoerythrin-conjugated F(ab)2 goat anti-mouse IgG 
(Jackson DnmunoReseareh) diluted 1:100 in dilution buffer. Cells were washed with 
200 ul of washing buffer and resuspended in 200 jil of PBS. Samples were analysed 
by using a FACS calibur apparatus (Becton Dickinson), and data were analyzed by 
using CELL QUEST (Becton Dickinson). A shift in mean fluorescence intensity of 
>75 channels compared with preimmune sera from the same mice was considered 
positive. This cutoff was determined from the mean plus two standard deviations of 
shifts obtained with control sera raised against mock purified recombinant proteins 
from cultures of E. coli carrying the empty expression vector and included in every 
experiment Artifacts due to bacterial lysis were excluded by using antisera raised 
against six different known cytoplasmic proteins, all of which gave negative results. 

Regions o f Atypical Nucleotide Composition. 

These regions were identified by the x 2 analysis: the distribution of all 64 
trinucleotides (3 mers) was computed for the complete genome in all six reading 
frames, followed by the 3-mer distribution in 2,000-bp windows. Windows 
overlapped by 1,000 bp. For each window, the x 2 statistic on the difference between 
its 3-mer content, and that of the whole genome was computed. 
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In Silico Genome Comparisons 

The protein sets of S. agalactiae, Streptococcus pneumoniae and S. pyogenes 
were compared by using FASTA3. A general description of the FASTA3 sequence 
comparison program is discussed in Pearson, W.R., "Flexible Sequence Similarity 
Searching with the FASTA3 Program Package", (2000) Methods Mol. Biol, 132: 
185-219. Shared genes were defined using a FASTA3 P value cutoff of 10 115 . These 
shared genes and genes that S. agalactiae did not share with the other streptococci 
using this cutoff were subsequently searched against all completely sequenced 
genomes, and genes were defined as unique to streptococci or S. agalactiae when they 
did not share similarity with any other gene sets with a FASTA3 P value of 10" 5 or 
lower. The use of two cutoffs provides for a more stringent analysis of shared or 
unique genes. 

Figure 2 is a schematic representation of in silico comparisons between 
streptococci. The protein sets of GBS, S. pn., and GAS were compared by using 
FASTA3. Numbers under the species name indicate genes that are not shared with 
the other species; values in parenthesis are the number of proteins in each species 
(excluding frame-shifted and degenerated genes). Numbers in the intersections 
indicate genes sharedby two or three species. These are displayed in the color 
corresponding to the species used as the query. (GBS: green; S.pn.: blue; GAS: 
red). Numbers in any given intersection are slightly different due to gene duplications 
in some species. 

Table 3 lists genes which were shared among GBS, GAS and pneumococcus, 
but which were not found in any of the other completely sequenced genomes. Hie 
protein sets of S. agalactiae, S. pneumoniae, and 5. pyogenes were compared using 
FASTA3 [Pearson, W. R. (2000) Methods Mol Biol 132, 1 85-219]. Shared genes 
were defined using a FASTA3 p value cutoff of 10" 15 . These shared genes and genes 
that S. agalactiae did not share with the other streptococci using this cutoff were 
subsequently searched against all completely sequenced genomes and genes were 
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defined as unique to streptococci or S. agalactiae when they did not share similarity 
with any other gene sets with a FASTA3 p value of 10 s or lower. 



Regions of conservation of gene synteny were computed as windows of 10 kb 
spanning at least three genes whose order was conserved in the other species. 
Regions were merged if they were less than 20 kb apart The number of genes within 
each broad region was then calculated. 

Comparative Genome Hybridizations 

Comparative genome hybridizations (See Figure 1) using DNA microarrays 
were performed between the sequenced type V strain 2603 V/R and 19 other GBS 
strains of multiple serotypes (See Table %). Predicted genes from strain 2603 V/R 
were amplified by PCR and arrayed on glass microscope slides. See Peterson, et al., 
(2000) /. Bacterial 182, 6192-6202. Genomic DNA was labelled according to 
protocols provided by J. DeRisi fwww.microarravs.org/Pdfs/Genomic- 
PNALabel B.pdfl . except that the DNA was not digested or sheared before labelling. 
Arrays were scanned with a GENEPIX 4000B scanner (Axon Instruments, Foster 
City, CA), and individual hybridisation signals were quantitated with TTGR 
SPOTFJNDER. See Hedge, et al., (2000), Biotechniques 29, 548-550, 552-554, 556. 
Cy3/Cy5 (2603 V/R signal/test strain) ratio cutoffs were defined arbitrarily as 
Cy3/Cy5 = 1.0 - 3.0, gene present in test strain; 3.0 - 10.0, ambiguous result; >10.0, 
gene absent For ambiguous results, the gene may be divergent in the test strain 
relative to 2603 V/R, or the gene may be absent in the test strain but still produces 
paralogous gene family or a repetitive elemtn. Although cutoffs are arbitrary, they fit 
nicely the results for the variation of the capsule locus in the strains tested (see region 
9 on Figure 1) where most genes are slightly divergent and only a few are completely 
different 
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The CGH detected 1,698 genes in all of the strains, whereas 401 genes from 
strain 2603 V/R (18% of the gene complement) were not detected in at least one other 
strain, suggesting that they are absent or significantly divergent in those strains. Two 
hundred sixty (38%) of the 683 genes specific to S. agalactiae when compared with 
the other two streptococci (Fig. 2), including virulence determinants and surface 
proteins, vary among 5. agalactiae strains, whereas only 47 (4%) of the genes 
common to all three streptococcal species, including 5 of the 6 sortases identified in 
the genome, vary among strains. Thus, the in silico analysis of genes shared by the 
streptococci that are not expected to vary among this genus is consistent with the 
CGH analysis. Forty-four (25%) of the genes shared by & agalactiae and £ 
pneumoniae and 44 (20%) of those shared by S. agalactiae and S. pyogenes vary in 
the CGH analysis. The first set contains many glycosyl transferases and proteins 
carrying a cell-wall anchor, whereas the second set displays many phage-related 
genes. One hundred thirty-six of the 315 genes unique to S. agalactiae when 
compared with all sequenced genomes vary among strains. These include R5, three 
capsular genes, two cell wall-anchored proteins, and three transcriptional regulators. 
Three hundred sixty-four (91%) of the 401 varying genes correspond to 15 regions 
containing more than 5 contiguous genes. Ten of these regions display an atypical 
nucleotide composition in strain 2603 V/R (Fig. 1), consistent with the possibility that 
they were horizontally transferred into this strain. Two of the largest regions (region 
4, a prophage and region 7, similar to Tn916 from Enterococcus faecalis) are flanked 
by insertion sequence elements. The 15 regions contain many proteins predicted to be 
anchored on the cell wall or surface exposed, including Rib (region 3), sortases, 
glycosyl transferases, the capsule locus (region 9, divergent in all strains but the other 
type V strain CJB1 11), and phage-related genes. Region 14 is unique to S. agalactiae 
and spans 33 genes (SAG1989- SAG2021), including 25 proteins of unknown 
function, some of which carry a cell-wall anchor. It is flanked by an ISL3 transposase 
and displays an atypical nucleotide composition. Region 1, unique to 5. agalactiae, is 
a possible plasmid or remnant of a phage (SAG0218-SAG0238), contains mostly 




PATENT APPUCATION 
ATTYREFNO. 19195.001 



hypothetical proteins, and is flanked by a site-specific recombinase. Region 8 is 
specific to & agalactiae, comprises 20 proteins of unknown function (SAG1018- 
SAG1037), most of which are predicted to be membrane associated or secreted, and 
displays an atypical nucleotide composition. 

The CGHresults were analyzed by profile clustering where genes are grouped 
based on their distribution patterns (Fig. 5). Sixteen clusters of five or more 
contiguous and noncontiguous genes comprising a total of 300 genes were identified 
(Table 6). Several clusters correspond to regions of contiguous genes described 
above. Some clusters of genes that do not share sequence similarity and are located at 
different loci in the genome display an identical profile. For instance, a cluster of 
genes containing a surface antigen (SAG0674-SAGO681) follows the same 
distribution as another cluster containing only hypothetical proteins (SAG0247- 
SAG0249). A putative pathogenicity protein (SAG2063) also clusters with a region 
containing several glycosyl transferases and Sec proteins (SAG1447-SAG1462). 

Profile clustering was also used to group strains based on similarity of gene 
content (Fig. 5). In addition, the sequences of 19 genes from each of 1 1 S. agalactiae 
strains were determined after PCR amplification and used for phylogenetic analyses. 
The strains were the following: type la, 090 and A909; type lb, H36B; type II, 
18RS21; type m, COH1, M732 and M781; type V, 2603 V/R and 1169NT1; type 
Vm, JM9130013; and nontypeable strain CJB1 10. The set comprised 8 
housekeeping genes and 1 1 genes coding for proteins predicted to be surface-exposed 
(Table 7). 

The profile clustering was conducted as follows. The information and absence 
of genes based on the comparative genome hybridisation results was used to group 
genes based on their distribution patterns. The analysis used was essentially identical 
to that used for phylogenetic profile analysis. See Pellegrinie, et al., (1999) Proc. 
Natl Acad. ScL USA 96, 4285 - 4288. Each gene was assigned a binary profile based 
on its presence or absence across the different strains, with presence determined by a 
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Cy3/Cy5 ratio < 3.0 and absence > 3.0. The gene profiles were then clustered by 
using the single-linkage clustering algorithm with column weighting (all with default 
settings) of CLUSTER (http://rana.lbl.jgov) . The CLUSTER program also groups the 
strains (columns) based on similarity of gene profiles. Clusters of genes and strains 



Phylogenetic trees were inferred for the complete set of 19 genes and for the 
subsets of housekeeping and surface-exposed genes. Because the branching patterns 
in all three trees were identical, only the tree of the 19 genes is shown in Fig. 3. The 
degree of polymorphism of the housekeeping and the surface-exposed genes is similar 
(~1 variable site among all of the strains per 100 bp). 

The sequences of genes from the different strains were aligned by using 
CLUSTALW (See Thompson (1994), Nucleic Acids Res. 22, 4673 - 4680,) and 
trimmed to remove ambiguously aligned regions. Phylognetic trees of individual 
genes and of concatenated alignments of multiple genes were inferred by using 
maximum likelihood methods of PAUP* 4.0 blO (Sinauer, Sunderland, MA). 
Bootstrap analysis was carried out using PAUP* as well. The possibility of 
recombination among strains was examined by using analysis of sequence variation 
using SIMPLOT (S.C. Ray) and analysis of phylogenetic heterogeneity by using 
MACCLADE (Sinauer). 

Analysis of this variation showed no evidence for major recombination events 
between the strains. There were no long stretches of polymorphic sites that strongly 
supported other trees (analysis with MACCLADE), and there were no significant 
crossover events in plots of sequence similarity between strains (analysis with 
SIMPLOT). Some strain groupings (clades) generated by phylogenetic analysis were 
similar to clusters from the profile analysis (type HI strains M781, M732 and COH1; 
type la strain 090 and nontypable strain CJB1 10), whereas others were different, 
possibly because of the aforementioned problems with the profile clustering. In both 
the phylogenetic analysis and the profile clustering, there is serotypedependent and - 



were viewed by using TREEVIEW flittp://rana.lbl.gov) . 
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independent clustering (Figs. 3 and 5). The presence of strains of the same serotype 
in different clades or clusters could be due to lateral gene transfer. 

Figure 5 demonstrates phylogenetic profiling of GBS strains based on 
comparative genome hybridisations. The information on presence and absence of 
genes based on the microarray comparative genome hybridization results was used for 
phylogenetic profile analysis. The presence of a particular gene or gene cluster is 
indicated in the figure by a red square and the absence of a gene or cluster by a black 
square. The relationship between strains based on this analysis is depicted by the tree 
at the top of the figure. The strains and their serotypes are indicated (NT: 
nontypeable). Clusters with identical profiles are reduced to a single horizontal line 
and the number of genes in each cluster is indicated on the right. The clusters of 5 or 
more genes, labeled in red text and numbered, are listed in Table 6. The 1698 genes 
shared by all 19 strains are labeled in green text. 

Figure 3 depicts a phylogenetic tree of GBS strains based on PCR sequences. 
The sequences of 19 genes (Table 7) from each of 11 GBS strains were aligned and 
trimmed to remove ambiguously aligned regions, and phylogenetic trees were 
inferred. Strain names are indicated in bold, arid serotypes are indicated under the 
strain names. Bootstrap values are indicated on the branches. 

Techniques 

A summary of standard techniques and procedures which may be employed in 
order to perform the invention (e.g. to utilise the disclosed sequences for vaccination 
or diagnostic purposes) follows. This summary is not a limitation on the invention, 
but gives examples that may be used, but are not required. 



The practice of the present invention will employ, unless otherwise indicated, 
conventional techniques of molecular biology, microbiology, recombinant DNA, and * 
immunology, which are within the skill of the art Such techniques are explained fully in the 



General 
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literature eg. Sambrook Molecular Cloning; A Laboratory Manual, Second Edition (1989) or 
Third Edition (2000); UNA Cloning, Volumes land II (D.N Glover ed. 1985); Oligonucleotide 
Synthesis (MJ. Gait ed, 1984); Nucleic Acid Hybridization (B.D. Hames & S J. Higgins eds. 
1984); Transcription and Translation (B.D. Hames & SJ. Higgins eds. 1984); Animal Cell 
Culture (R.I. Freshney ed. 1986); Immobilized Cells and Enzymes (IRL Press, 1986); B. Perbal, 
A Practical Guide to Molecular Cloning (1984); the Methods in Enzymology series (Academic 
Press, Inc.), especially volumes 154 & 155; Gene Transfer Vectors for Mammalian Cells (J.H. 
Miller and MP. Calos eds. 1987, Cold Spring Harbor Laboratory); Mayer and Walker, eds. 
(1987), Immunochemical Methods in Cell and Molecular Biology (Academic Press, London); 
Scopes, (1987) Protein Purification: Principles and Practice, Second Edition (Springer- Verlag, 
N.Y.), and Handbook of Experimental Immunology, Volumes J-/V (D.M. Weir and C. C. 
Blackwell eds 1986). 

Standard abbreviations for nucleotides and amino acids are used in this specification. 
Further Definitions 

A composition containing X is "substantially free of Y when at least 85% by weight of the 
total X+Y in the composition is X. Preferably, X comprises at least about 90% by weight of the 
total of X+Y in the composition, more preferably at least about 95% or even 99% by weight 
Hie term "comprising" means "including" as well as "consisting" e.g. a composition 
"comprising" X may consist exclusively of X or may include something additional e.g. X + Y. 
The singular forms V, "and", and "the" include plural referents unless the context clearly 
dictates otherwise. Thus, for example, reference to "a polynucleotide" includes a plurality of 
such polynucleotides and reference to "an epithelial cell" includes reference to one or more 
cells and equivalents thereof known to those skilled in the art, etc. 
The term "heterologous" refers to two biological components that are not found together in 
nature. The components may be host cells, genes, or regulatory regions, such as promoters. 
Although the heterologous components are not found together in nature, they can function 
together, as when a promoter heterologous to a gene is operably linked to the gene. Another 
example is where a Streptococcal sequence is heterologous to a mouse host cell. A further 
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examples would be two epitopes from the same or different proteins which have been 
assembled in a single protein in an arrangement not found in nature. 

An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of 
polynucleotides, such.as an expression vector. The origin of replication behaves as an 
autonomous unit of polynucleotide replication within a cell, capable of replication under its 
own control. An origin of replication may be needed for a vector to replicate in a particular host 
cell. With certain origins of replication, an expression vector can be reproduced at a high copy 
number in the presence of the appropriate proteins within the cell. Examples of origins are the 
autonomously replicating sequences, which are effective in yeast; and the viral T-antigen, 
effective in COS-7 cells. 

A •'mutant" sequence is defined as DNA, RNA or amino acid sequence differing from but 
having sequence identity with the native or disclosed sequence. Depending on the particular 
sequence, the degree of sequence identity between the native or disclosed sequence and the 
mutant sequence is preferably greater than 50% (eg. 60%, 70%, 80%, 90%, 95%, 99% or more, 
calculated using the Smith-Waterman algorithm as described above). As used herein, an "allelic 
variant" of a nucleic acid molecule, or region, for which nucleic acid sequence is provided 
herein is a nucleic acid molecule, or region, that occurs essentially at the same locus in the 
genome of another or second isolate, and that, due to natural variation caused by, for example, 
mutation or recombination, has a similar but not identical nucleic acid sequence. A coding 
region allelic variant typically encodes a protein having similar activity to that of the protein 
encoded by the gene to which it is being compared. An allelic variant can also comprise an 
alteration in the 5* or 3* untranslated regions of the gene, such as in regulatory control regions 
(eg. see US patent 5,753,235). 

Expression systems 

The Streptococcal nucleotide sequences can be expressed in a variety of different expression 
systems; for example those used with mammalian cells, baculoviruses, plants, bacteria, and 
yeast 
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I Mammalian Systems 

Mammalian expression systems are known in the art A mammalian promoter is any DNA 
sequence capable of binding mammalian RNA polymerase and initiating the downstream (3*) 
transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 
transcription initiating region, which is usually placed proximal to the 5* end of the coding 
sequence, and a TATA box, usually located 25-30 base pairs (bp) upstream of the transcription 
initiation site. Hie TATA box is thought to direct RNA polymerase II to begin RNA synthesis 
at the. correct site. A mammalian promoter will also contain an upstream promoter element, 
usually located within 100 to 200 bp upstream of the TATA box. An upstream promoter 
element determines the rate at which transcription is initiated and can act in either orientation 
[Sambrook et al. (1989) "Expression of Cloned Genes in Mammalian Cells." In Molecular 
Cloning: A Laboratory Manual, 2nd ed.]. 

Mammalian viral genes are often highly expressed and have a broad host range; therefore 
sequences encoding mammalian viral genes provide particularly useful promoter sequences. 
Examples include the SV40 early promoter, mouse mammary tumor virus LTR promoter, 
adenovirus major late promoter (Ad MLP), and herpes simplex virus promoter. In addition, 
sequences derived from non-viral genes, such as the murine metallotheionein gene, also provide 
useful promoter sequences. Expression may be either constitutive or regulated (inducible), 
depending on the promoter can be induced with glucocorticoid in hormone-responsive cells. 
The presence of an enhancer element (enhancer), combined with the promoter elements 
described above, will usually increase expression levels. An enhancer is a regulatory DNA 
sequence that can stimulate transcription up to 1000-fold when linked to homologous or 
heterologous promoters, with synthesis beginning at the normal RNA start site. Enhancers are 
also active when they are placed upstream or downstream from the transcription initiation site, 
in either normal or flipped orientation, or at a distance of more than 1000 nucleotides from the 
promoter [Maniatis et al. (1987) Science 236:1237; Alberts et al. (1989) Molecular Biology of 
the Cell, 2nd ed.]. Enhancer elements derived from viruses may be particularly useful, because 
they usually have a broader host range. Examples include the SV40 early gene enhancer 
[Dijkema et al (1985) EMBO J. 4:761] and the enhancer/promoters derived from the long 
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terminal repeat (LTR) of the Rous Sarcoma Virus [Gorman et al. (1982b) Proc. NatL Acad. Scl 
79:6117] and from human cytomegalovirus [Boshart et al (1985) Cell 47:521]. Additionally, 
some enhancers are regulatable and become active only in the presence of an inducer, such as a 
hormone or metal ion [Sassone-Corsi and Borelli (1986) Trends Genet 2:215; Maniatis et al. 
(1987) Science 236:1237]. 

A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence 
may be directly linked with the DNA molecule, in which case the first amino acid at the N- 
terminus of the recombinant protein will always be a methionine, which is encoded by the ATG 
start codon. If desired, the N-terminus may be cleaved from the protein by in vitro incubation 
with cyanogen bromide. 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by 
creating chimeric DNA molecules that encode a fusion protein comprised of a leader sequence 
fragment that provides for secretion of the foreign protein in mammalian cells. Preferably, there 
are processing sites encoded between the leader fragment and the foreign gene that can be 
cleaved either in vivo or in vitro. The leader sequence fragment usually encodes a signal peptide 
comprised of hydrophobic amino acids which direct the secretion of the protein from the cell. 
The adenovirus triparite leader is an example of a leader sequence that provides for secretion of 
a foreign protein in mammalian cells. 

Usually, transcription termination and polyadenylation sequences recognized by mammalian 
cells are regulatory regions located 3' to the translation stop codon and thus, together with the 
promoter elements, flank the coding sequence. The 3 1 terminus of the mature mRNA is formed 
by site-specific post-transcriptional cleavage and polyadenylation [Birnstiel et al. (1985) Cell 
47:349; Proudfoot and Whitelaw (1988) "Termination and 3' end processing of eukaryotic 
RNA. In Transcription and splicing (ed. B.D. Hames and D.M. Glover); Proudfoot (1989) 
Trends Biochenu ScL 74:105], These sequences direct the transcription of an mRNA which can 
be translated into the polypeptide encoded by the DNA. Examples of transcription 
terminater/polyadenylation signals include those derived from SV40 [Sambrook et al (1989) 
"Expression of cloned genes in cultured mammalian cells." In Molecular Cloning: A 
Laboratory Manual] . 
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Usually, the above described components, comprising a promoter, polyadenylation signal, and 
transcription termination sequence are put together into expression constructs. Enhancers, 
introns with functional splice donor and acceptor sites, and leader sequences may also be 
included in an expression construct, if desired. Expression constructs are often maintained in a 
replicon, such as an extrachromosomal element (eg. plasmids) capable of stable maintenance in 
a host, such as mammalian cells or bacteria Mammalian replication systems include those 
derived from animal viruses, which require trans-acting factors to replicate. For example, 
plasmids containing the replication systems of papovaviruses, such as SV40 [Gluzman (1981) 
Cell 23: 175] or polyomavirus, replicate to extremely high copy number in the presence of the 
appropriate viral T antigen. Additional examples of mammalian replicons include those derived 
from bovine papillomavirus and Epstein-Barr virus. Additionally, the replicon may have two 
replicaton systems, thus allowing it to be maintained, for example, in mammalian cells for 
expression and in a prokaryotic host for cloning and amplification. Examples of such 
mammalian-bacteria shuttle vectors include pMT2 [Kaufman et al. (1989) Mol. Cell Biol 
9:946] and pHEBO [Shimizu et al. (1986) Mol Cell Biol 6:1074]. 
Hie transformation procedure used depends upon the host to be transformed. Methods for 
introduction of heterologous polynucleotides into mammalian cells are known in the art and 
include dextran-rnediated transfection, calcium phosphate precipitation, polybrene mediated 
transfection, protoplast fusion, electroporation, encapsulation of the polynucleotide(s) in 
liposomes, and direct microinjection of the DNA into nuclei. 

Mammalian cell lines available as hosts for expression are known in the art and include many 
immortalized cell lines available from the American Type Culture Collection (ATCC), 
including but not limited to, Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster 
kidney (BHK) cells, monkey kidney cells (COS), human hepatocellular carcinoma cells (eg. 
Hep G2), and a number of other cell lines. 

ii. Baculovirus Systems 

Hie polynucleotide encoding the protein can also be inserted into a suitable insect expression 
vector, and is operably linked to the control elements within that vector. Vector construction 



49 





.7* 



PATENT APPUCATION 
ATTYREFNO. 19195.001 



employs techniques which are known in the art. Generally, the components of the expression 
system include a transfer vector, usually a bacterial plasmid, which contains both a fragment of 
the baculovinis genome, and a convenient restriction site for insertion of the heterologous gene 
or genes to be expressed; a wild type baculovinis with a sequence homologous to the 
baculovirus-specific fragment in the transfer vector (this allows for the homologous 
recombination of the heterologous gene in to the baculovinis genome); and appropriate insect 
host cells and growth media. 

After inserting the DNA sequence encoding the protein into the transfer vector, the vector and 
the wild type viral genome are transfected into an insect host cell where the vector and viral 
genome are allowed to recombine. The packaged recombinant virus is expressed and 
recombinant plaques are identified and purified. Materials and methods for baculovirus/insect 
cell expression systems are commercially available in kit form from, inter alia, Invitrogen, San 
Diego CA ("MaxBac" kit). These techniques are generally known to those skilled in the art and 
fully described in Summers & Smith, Texas Agricultural Experiment Station Bulletin No. 1555 
(1987) ("Summers & Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovinis genome, the 
above described components, comprising a promoter, leader (if desired), coding sequence, and 
transcription termination sequence, are usually assembled into an intermediate transplacement 
construct (transfer vector). This may contain a single gene and operably linked regulatory 
elements; multiple genes, each with its owned set of operably linked regulatory elements; or 
multiple genes, regulated by the same set of regulatory elements. Intermediate transplacement 
constructs are often maintained in a replicon, such as an extra-chromosomal element (e.g. 
plasmids) capable of stable maintenance in a host, such as a bacterium. Hie replicon will have a 
replication system, thus allowing it to be maintained in a suitable host for cloning and 
amplification. 

Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is 
pAc373. Many other vectors, known to those of skill in the art, have also been designed. These 
include, for example, pVL985 (which alters the polyhedrin start codon from ATG to ATT, and 
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which introduces a BamHI cloning site 32 basepairs downstream from the ATT; see Luckow 
and Summers, Virology (1989) 77:31. 

The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (1988) 
Ann. Rev. Microbiol, 42:177) and a prokaryotic ampicillin-resistance (amp) gene and origin of 
replication for selection and propagation in & coli. 

Baculovirus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is 
any DNA sequence capable of binding a baculovirus RNA polymerase and initiating the 
downstream (5* to 3') transcription of a coding sequence (eg. structural gene) into mRNA. A 
promoter will have a transcription initiation region which is usually placed proximal to the 5' 
end of the coding sequence. This transcription initiation region usually includes an RNA 
polymerase binding site and a transcription initiation site. A baculovirus transfer vector may 
also have a second domain called an enhancer, which, if present, is usually distal to the 
structural gene. Expression may be either regulated or constitutive. 
Structural genes, abundantly transcribed at late times in a viral infection cycle, provide 
particularly useful promoter sequences. Examples include sequences derived from the gene 
encoding the viral polyhedron protein, Friesen et al., (1986) **The Regulation of Baculovirus 
Gene Expression,** in: The Molecular Biology of Baculoviruses (ed. Walter Doerfler); EPO 
Publ. Nos. 127 839 and 155 476; and the gene encoding the plO protein, Vlak et al., (1988), J. 
Gen. Virol 69:765. 

DNA encoding suitable signal sequences can be derived from genes for secreted insect or 
baculovirus proteins, such as the baculovirus polyhedrin gene (Carbonell et al. (1988) Gene, 
75:409). Alternatively, since the signals for mammalian cell posttranslational modifications 
(such as signal peptide cleavage, proteolytic cleavage, and phosphorylation) appear to be 
recognized by insect cells, and the signals required for secretion and nuclear accumulation also 
appear to be conserved between the invertebrate cells and vertebrate cells, leaders of non-insect 
origin, such as those derived from genes encoding human D-interferon, Maeda et al., (1985), 
Nature 575:592; human gastrin-releasing peptide, Lebacq-Verheyden et al., (1988), Molec 
Cell Biol 5:3129; human DL-2, Smith et al., (1985) Proa Nat'lAcad. ScL USA, 82:8404; 
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mouse IL-3, (Miyajima et al., (1987) Gene 55:273; and human glucocerebrosidase, Martin et al. 
(1988) DNA, 7:99, can also be used to provide for secretion in insects. 
A recombinant polypeptide or polyprotein may be expressed intracellularly or, if it is expressed 
with the proper regulatory sequences, it can be secreted. Good intracellular expression of 
nonfused foreign proteins usually requires heterologous genes that ideally have a short leader 
sequence containing suitable translation initiation signals preceding an ATG start signal. If 
desired, methionine at the N-terminus may be cleaved from the mature protein by in vitro 
incubation with cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be 
secreted from the insect cell by creating chimeric DNA molecules that encode a fusion protein 
comprised of a leader sequence fragment that provides for secretion of the foreign protein in 
insects. The leader sequence fragment usually encodes a signal peptide comprised of 
hydrophobic amino acids which direct the translocation of the protein into the endoplasmic 
reticulum. 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor 
of the protein, an insect cell host is co-transformed with the heterologous DNA of the transfer 
vector and the genomic DNA of wild type baculovims — usually by co-transfection. The 
promoter and transcription termination sequence of the construct will usually comprise a 2-5kb 
section of the baculovirus genome. Methods for introducing heterologous DNA into the desired 
site in the baculovirus virus are known in the art (See Summers & Smith supra; Ju et al. 
(1987); Smith et al., Mol Cell Biol (1983) 5:2156; and Luckow and Summers (1989)). For 
example, the insertion can be into a gene such as the polyhedrin gene, by homologous double 
crossover recombination; insertion can also be into a restriction enzyme site engineered into the 
desired baculovirus gene. Miller et al, (1989), Bioessays 4:91.The DNA sequence, when cloned 
in place of the polyhedrin gene in the expression vector, is flanked both 5* and 3* by polyhedrin- 
specific sequences and is positioned downstream of the polyhedrin promoter. 
The newly formed baculovirus expression vector is subsequently packaged into an infectious 
recombinant baculovirus. Homologous recombination occurs at low frequency (between about 
1% and about 5%); thus, the majority of the vims produced after cotransfection is still wild-type 
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virus. Therefore, a method is necessary to identify recombinant viruses. An advantage of the 
expression system is a visual screen allowing recombinant viruses to be distinguished. The 
polyhedrin protein, which is produced by the native virus, is produced at very high levels in the 
nuclei of infected cells at late times after viral infection. Accumulated polyhedrin protein forms 
occlusion bodies that also contain embedded particles. These occlusion bodies, up to IS |im in 
size, are highly refractile, giving them a bright shiny appearance that is readily visualized under 
the light microscope. Cells infected with recombinant viruses lack occlusion bodies. To 
distinguish recombinant vims from wild-type virus, the transfection supernatant is plaqued onto 
a monolayer of insect cells by techniques known to those skilled in the art Namely, the plaques 
are screened under the light microscope for the presence (indicative of wild-type virus) or 
absence (indicative of recombinant virus) of occlusion bodies. "Current Protocols in 
Microbiology" Vol. 2 (Ausubel et aL eds) at 16.8 (Supp. 10, 1990); Summers & Smith, supra; 
Miller etal. (1989). 

Recombinant baculovirus expression vectors have been developed for infection into several 
insect cells. For example, recombinant baculoviruses have been developed for, inter alia: Aedes 
aegypti , Autographa californica, Bombyx mori, Drosophila melanogaster, Spodoptera 
frugiperda, and Trichoplusia ni (WO 89/046699; Carbonell et al., (1985) 7. Virol 55:153; 
Wright (1986) Nature 527:718; Smith etal., (1983) Mol Cell Biol 5:2156; and see generally, 
Fraser, etal (1989) In Vitro Cell Dev. Biol 25:225). 

Cells and cell culture media are commercially available for both direct and fusion expression of 
heterologous polypeptides in a baculovirus/expression system; cell culture technology is 
generally known to those skilled in the art See, eg. Summers & Smith supra. 
The modified insect cells may then be grown in an appropriate nutrient medium, which allows 
for stable maintenance of the plasmid(s) present in the modified insect host Where the 
expression product gene is under inducible control, the host may be grown to high density, and 
expression induced. Alternatively, where expression is constitutive, the product will be 
continuously expressed into the medium and the nutrient medium must be continuously 
circulated, while removing the product of interest and augmenting depleted nutrients. Hie 
product may be purified by such techniques as chromatography, eg. HPLC, affinity 
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chromatography, ion exchange chromatography, eta; electrophoresis; density gradient 
centrifiigation; solvent extraction, etc. As appropriate, the product may be further purified, as 
required, so as to remove substantially any insect proteins which are also present in the 
medium, so as to provide a product which is at least substantially free of host debris, eg. 
proteins, lipids and polysaccharides. 

In order to obtain protein expression, recombinant host cells derived from the transformants are 
incubated under conditions which allow expression of the recombinant protein encoding 
sequence. These conditions will vary, dependent upon the host cell selected. However, the 
conditions are readily ascertainable to those of ordinary skill in the art, based upon what is 
known in the art. 

iii. Plant Systems 

There are many plant cell culture and whole plant genetic expression systems known in the art. 
Exemplary plant cellular genetic expression systems include those described in patents, such as: 
US 5,693,506; US 5,659,122; and US 5,608,143. Additional examples of genetic expression in 
plant cell culture has been described by Zenk, Phytochemistry 30:3861-3863 (1991). 
Descriptions of plant protein signal peptides may be found in addition to the references 
described above in Vaulcombe et al, Mol Gen. Genet 209:33-40 (1987); Chandler et al., Plant 
Molecular Biology 3:407-418 (1984); Rogers, / Biol Chenu 260:3731-3738 (1985); Rothstein 
et al., Gene 55:353-356 (1987); Whittier et al., Nucleic Acids Research 15:2515-2535 (1987); 
Wirsel et al., Molecular Microbiology 3:3-14 (1989); Yu et al., Gene 122:247-253 (1992). A 
description of the regulation of plant gene expression by the phytohormone, gibberellic acid and 
secreted enzymes induced by gibberellic acid can be found in R.L. Jones and J. MacMillin, 
Gibberellins: in: Advanced Plant Physiology,. Malcolm B. Wilkins, ed., 1984 Pitman 
Publishing Limited, London, pp. 21-52, References that describe other metabolically-regulated 
genes: Sheen, Plant Cell 2:1027-1038(1990); Maas et al., EMBO J. 9:3447-3452 (1990); 
Benkel and Hickey, Proc. Natl Acad ScL 84:1337-1339 (1987). 

Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into 
an expression cassette comprising genetic regulatory elements designed for operation in plants. 
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The expression cassette is inserted into a desired expression vector with companion sequences 
upstream and downstream from the expression cassette suitable for expression in a plant host. 
Hie companion sequences will be of plasmid or viral origin and provide necessary 
characteristics to the vector to permit the vectors to move DNA from an original cloning host, 
such as bacteria, to the desired plant host The basic bacterial/plant vector construct will 
preferably provide a broad host range prokaryote replication origin; a prokaryote selectable 
marker, and, for Agrobacterium transformations, T DNA sequences for Agrobacterium- 
mediated transfer to plant chromosomes. Where the heterologous gene is not readily amenable 
to detection, the construct will preferably also have a selectable marker gene suitable for 
determining if a plant cell has been transformed A general review of suitable markers, for 
example for the members of the grass family, is found in Wilmink and Dons, 1993, Plant MoL 
BioL Reptr 9 ll(2):165-\Z5. 

Sequences suitable for permitting integration of the heterologous sequence into the plant 
genome are also recommended. These might include transposon sequences and the like for 
homologous recombination as well as H sequences which permit random insertion of a 
heterologous expression cassette into a plant genome. Suitable prokaryote selectable markers 
include resistance toward antibiotics such as ampicillin or tetracycline. Other DNA sequences 
encoding additional functions may also be present in the vector, as is known in the art. 
Hie nucleic acid molecules of the subject invention may be included into an expression cassette 
for expression of the protein(s) of interest. Usually, there will be only one expression cassette, 
although two or more are feasible. The recombinant expression cassette will contain in addition 
to the heterologous protein encoding sequence the following elements, a promoter region, plant 
5* untranslated sequences, initiation codon depending upon whether or not the structural gene 
comes equipped with one, and a transcription and translation termination sequence. Unique 
restriction enzyme sites at the 5' and 3* ends of the cassette allow for easy insertion into a pre- 
existing vector. 

A heterologous coding sequence may be for any protein relating to the present invention. The 
sequence encoding the protein of interest will encode a signal peptide which allows processing 
and translocation of the protein, as appropriate, and will usually lack any sequence which might 
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result in the binding of the desired protein of the invention to a membrane. Since, for the most 
part, the transcriptional initiation region will be for a gene which is expressed and translocated 
during germination, by employing the signal peptide which provides for translocation, one may 
also provide for translocation of the protein of interest. In this way, the protein(s) of interest 
will be translocated from the cells in which they are expressed and may be efficiently harvested. 
Topically secretion in seeds are across the aleurone or scutellar epithelium layer into the 
endosperm of the seed. While it is not required that the protein Be secreted from the cells in 
which the protein is produced, this facilitates the isolation and purification of the recombinant 
protein. 

Since the ultimate expression of the desired gene product will be in a eucaryotic cell it is 
desirable to determine whether any portion of the cloned gene contains sequences which will be 
processed out as introns by the host's splicosome machinery. If so, site-directed mutagenesis of 
the "intron" region may be conducted to prevent losing a portion of the genetic message as a 
false intron code, Reed and Maniatis, Cell 41 :95-105, 1985. 
The vector can be microinjected directly into plant cells by/use of micropipettes to 
mechanically transfer the recombinant DNA. Crossway, MoL Gen. Genet, 202:179-185, 1985. 
The genetic material may also be transferred into the plant cell by using polyethylene glycol, 
Krens, et al., Nature, 296, 72-74, 1982. Another method of introduction of nucleic acid 
segments is high velocity ballistic penetration by small particles with the nucleic acid either 
within the matrix of small beads or particles, or on the surface, Klein, et a!., Nature, 327, 70-73, 
1987 and Knudsen and Muller, 1991 , Planta, 185:330-336 teaching particle bombardment of 
barley endosperm to create transgenic barley. Yet another method of introduction would be 
fusion of protoplasts with other entities, either minicells, cells, lysosomes or other fusible lipid- 
surfaced bodies, Fraley, et al., Proc. NatL Acad ScL USA, 79, 1859-1863, 1982. 
The vector may also be introduced into the plant cells by electroporation. (Fromm et al., Proc. 
Natl Acad ScL USA 82:5824, 1985). In this technique, plant protoplasts are electroporated in 
the presence of plasmids containing the gene construct Electrical impulses of high field 
strength reversibly permeabilize biomembranes allowing the introduction of the plasmids. 
Electroporated plant protoplasts reform the cell wall, divide, and form plant callus. 
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All plants from which protoplasts can be isolated and cultured to give whole regenerated plants 
can be transformed by the present invention so that whole plants are recovered which contain 
the transferred gene. It is known that practically all plants can be regenerated from cultured 
cells or tissues, including but not limited to all major species of sugarcane, sugar beet, cotton, 
fruit and other trees, legumes and vegetables. Some suitable plants include, for example, species 
from the genera Fragaria, Lotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, 
Linum, Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, 
Capsicum, Datura, Hyoscyamus, Lycopersion, Nicotiana, Solatium, Petunia, Digitalis, 
Majorana, Cichorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, 
Nemesia, Pelargonium, Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, 
Browaalia, Glycine, Lolium, Zea, Triticum, Sorghum, and Datura. 
Means for regeneration vary from species to species of plants, but generally a suspension of 
transformed protoplasts containing copies of the heterologous gene is first provided. Callus 
tissue is formed and shoots may be induced from callus and subsequently rooted. Alternatively, 
embryo formation can be induced from the protoplast suspension. These embryos germinate as 
natural embryos to form plants. The culture media will generally contain various amino acids 
and hormones, such as auxin and cy tokinins. It is also advantageous to add glutamic acid and 
proline to the medium, especially for such species as corn and alfalfa. Shoots and roots 
normally develop simultaneously. Efficient regeneration will depend on the medium, on the 
genotype, and on the history of the culture. If these three variables are controlled, then 
regeneration is fully reproducible and repeatable. 

In some plant cell culture systems, the desired protein of the invention may be excreted or 
alternatively, the protein may be extracted from the whole plant Where the desired protein of 
the invention is secreted into the medium, it may be collected. Alternatively, the embryos and 
embryoless-half seeds or other plant tissue may be mechanically disrupted to release any 
secreted protein between cells and tissues. The mixture may be suspended in a buffer solution 
to retrieve soluble proteins. Conventional protein isolation and purification methods will be 
then used to purify the recombinant protein. Parameters of time, temperature pH, oxygen, and 
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volumes will be adjusted through routine methods to optimize expression and recovery of 
heterologous protein. 

iv. Bacterial Systems 

Bacterial expression techniques are known in the art A bacterial promoter is any DNA 
sequence capable of binding bacterial RNA polymerase and initiating the downstream (3') 
transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 
transcription initiation region which is usually placed proximal to the 5' end of the coding 
sequence. This transcription initiation region usually includes an RNA polymerase binding site 
and a transcription initiation site. A bacterial promoter may also have a second domain called 
an operator, that may overlap an adjacent RNA polymerase binding site at which RNA 
synthesis begins. Hie operator permits negative regulated (inducible) transcription, as a gene 
repressor protein may bind the operator and thereby inhibit transcription of a specific gene. 
Constitutive expression may occur in the absence of negative regulatory elements, such as the 
operator. In addition, positive regulation may be achieved by a gene activator protein binding 
sequence, which, if present is usually proximal (5*) to the RNA polymerase binding sequence. 
An example of a gene activator protein is the catabolite activator protein (CAP), which helps 
initiate transcription of the lac operon in Escherichia coli (E. coli) [Raibaud et al (1984) Annu. 
Rev. Genet. 75:173], Regulated expression may therefore be either positive or negative, thereby 
either enhancing or reducing transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter 
sequences. Examples include promoter sequences derived from sugar metabolizing enzymes, 
such as galactose, lactose (lac) [Chang et al (1977) Nature 795:1056], and maltose. Additional 
examples include promoter sequences derived from biosynthetic enzymes such as tryptophan 
(trp) [Goeddel et al (1980) Nua Acids Res. 5:4057; Yelverton et al (1981) NucL Acids Res. 
9:731; US patent 4,738,921; EP-A-0036776 and EP-A-0121775]. The g-laotamase (bla) 
promoter system [Weissmann (1981) "Hie cloning of interferon and other mistakes." In 
Interferon 3 (ed. I. Gresser)], bacteriophage lambda PL [Shimatake etal (1981) Nature 
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292:128] and T5 [US patent 4,689,406] promoter systems also provide useful promoter 
sequences. 

In addition, synthetic promoters which do not occur in nature also function as bacterial 
promoters. For example, transcription activation sequences of one bacterial or bacteriophage 
promoter may be joined with the operon sequences of another bacterial or bacteriophage 
promoter, creating a synthetic hybrid promoter [US patent 4,551,433]. For example, the tac 
promoter is a hybrid trp-lac promoter comprised of both trp promoter and lac operon sequences 
that is regulated by the lac repressor [Amann et ah (1983) Gene 25:167; d*e Boer et al. (1983) 
Proc. Natl Acad. Sci. 80:21], Furthermore, a bacterial promoter can include naturally occurring 
promoters of non-bacterial origin that have the ability to bind bacterial RNA polymerase and 
initiate transcription. A naturally occurring promoter of non-bacterial origin can also be coupled 
with a compatible RNA polymerase to produce high levels of expression of some genes in 
prokaryotes. The bacteriophage T7 RNA polymerase/promoter system is an example of a 
coupled promoter system [Studier et al (1986) /. Mol Biol 189:1 13; Tabor etal (1985) Proc 
Natl Acad ScL 52:1074], In addition, a hybrid promoter can also be comprised of a 
bacteriophage promoter and an E. coli operator region (EPOA-0 267 851). 
In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful 
for the expression of foreign genes in prokaryotes. In E. colU the ribosome binding site is called 
the Shine-Dalgarno (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 
nucleotides in length located 3-1 1 nucleotides upstream of the initiation codon [Shine et al 
(1975) Nature 254:34], The SD sequence is thought to promote binding of mRNA to the 
ribosome by the pairing of bases between the SD sequence and the 3' and of E. coli 16S rRNA 
[Steitz etal (1979) "Genetic signals and nucleotide sequences in messenger RNA." In 
Biological Regulation and Development: Gene Expression (ed. R.F. Goldberger)]. To express 
eukaryotic genes and prokaryotic genes with weak ribosome-binding site [Sambrook etal 
(1989) "Expression of cloned genes in Escherichia coli." In Molecular Cloning: A Laboratory 
Manual}, 

A DNA molecule may be expressed intracellularly. A promoter sequence may be directly 
linked with the DNA molecule, in which case the first amino acid at the N-terminus will always 
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be a methionine, which is encoded by the ATG start codon. If desired, methionine at the N- 
terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide or by 
either in vivo on in vitro incubation with a bacterial methionine N-terminal peptidase (EPO-A-0 
219237). 

Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding 
the N-terminal portion of an endogenous bacterial protein, or other stable protein, is fused to the 
5* end of heterologous coding sequences. Upon expression, this construct will provide a fusion 
of the two amino acid sequences. For example, the bacteriophage lambda cell gene can be 
linked at the 5' terminus of a foreign gene and expressed in bacteria. The resulting fusion 
protein preferably retains a site for a processing enzyme (factor Xa) to cleave the bacteriophage 
protein from the foreign gene [Nagai et al (1984) Nature 309:810]. Fusion proteins can also be 
made with sequences from the lacZ [Jia etal (1987) Gene 60:197], trpE [Allen et al. (1987) J. 
Biotechnol. 5:93; Makoff et al (1989) /. Gen. Microbiol 135:1 1], and Chey [EP-A-0 324 647] 
genes. The DNA sequence at the junction of the two amino acid sequences may or may not 
encode a cleavable site. Another example is a ubiquitin fusion protein. Such a fusion protein is 
made with the ubiquitin region that preferably retains a site for a processing enzyme (eg. 
ubiquitin specific processing-protease) to cleave the ubiquitin from the foreign protein. Through 
this method, native foreign protein can be isolated [Miller et al (1989) Bio/Technology 7:698]. 
Alternatively, foreign proteins can also be secreted from the cell by creating chimeric DNA 
molecules that encode a fusion protein comprised of a signal peptide sequence fragment that 
provides for secretion of the foreign protein in bacteria [US patent 4336,336]. The signal 
sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the secretion of the protein from the cell. The protein is either secreted into the 
growth media (gram-positive bacteria) or into the periplasmic space, located between the inner 
and outer membrane of the cell (gram-negative bacteria). Preferably there are processing sites, 
which can be cleaved either in vivo or in vitro encoded between the signal peptide fragment and 
the foreign gene. 

DNA encoding suitable signal sequences can be derived from genes for secreted bacterial 
proteins, such as the E. coli outer membrane protein gene (ompA) [Masui et al (1983), in: 
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Experimental Manipulation of Gene Expression; Ghrayeb et al (1984) EMBO J. 3:2437] and 
the E. coli alkaline phosphatase signal sequence (phoA) [Oka et al. (1985) Proc. Afari. Acad. 
ScL 52:7212]. As an additional example, the signal sequence of the alpha-amylase gene from 
various Bacillus strains can be used to secrete heterologous proteins from B. subtilis [Palva et 
» al (1982) Proc. NatL Acad. ScL USA 79:5582; EP-A-0 244 042]. 
Usually, transcription termination sequences recognized by bacteria are regulatory regions 
located 3* to the translation stop codon, and thus together with the promoter flank the coding 
sequence. These sequences direct the transcription of an mRNA which can be translated into the 
polypeptide encoded by the DNA. Transcription termination sequences frequently include DNA 
sequences of about 50 nucleotides capable of forming stem loop structures that aid in 
terminating transcription. Examples include transcription termination sequences derived from 
genes with strong promoters, such as the trp gene in E. coli as well as other biosynthetic genes. 
Usually, the above described components, comprising a promoter, signal sequence (if desired), 
coding sequence of interest, and transcription termination sequence, arc put together into 
expression constructs. Expression constructs are often maintained in a replicon, such as an 
extrachromosomal element (eg. plasmids) capable of stable maintenance in a host, such as 
bacteria. The replicon will have a replication system, thus allowing it to be maintained in a 
prokaryotic host either for expression or for cloning and amplification. In addition, a replicon 
may be either a high or low copy number plasmid. A high copy number plasmid will generally 
have a copy number ranging from about 5 to about 200, and usually about 10 to about 150. A 
host containing a high copy number plasmid will preferably contain at least about 10, and more 
preferably at least about 20 plasmids. Either a high or low copy number vector may be selected, 
depending upon the effect of the vector and the foreign protein on the host 
Alternatively, the expression constructs can be integrated into the bacterial genome with an 
integrating vector. Integrating vectors usually contain at least one sequence homologous to the 
bacterial chromosome that allows the vector to integrate. Integrations appear to result from 
recombinations between homologous DNA in the vector and the bacterial chromosome. For 
example, integrating vectors constructed with DNA from various Bacillus strains integrate into 
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the Bacillus chromosome (EP-A- 0 127 328). Integrating vectors may also be comprised of 
bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable 
markers to allow for the selection of bacterial strains that have been transformed. Selectable 
markers can be expressed in the bacterial host and may include genes which render bacteria 
resistant to drugs such as ampicillin, chloramphenicol, eiy thromycin, kanamycin (neomycin), 
and tetracycline [Davies et al. (1978) Annu. Rev. Microbiol 52:469]. Selectable markers may 
also include biosynthetic genes, such as those in the histidine, tryptophan, and leucine 
biosynthetic pathways. 

Alternatively, some of the above described components can be put together in transformation 
vectors. Transformation vectors are usually comprised of a selectable market that is either 
maintained in a replicon or developed into an integrating vector, as described above. 
Expression and transformation vectors, either extra-chromosomal replicons or integrating 
vectors, have been developed for transformation into many bacteria. For example, expression 
vectors have been developed for, inter alia* the following bacteria: Bacillus subtilis [Palva et aL 
(1982) Proc. NatL Acad. Sci. USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 
84/04541], Escherichia coli [Shimatake etal (1981) Nature 292:128; Amann etal (1985) 
Gene 40:183; Studier et al (1986) /. MoL Biol 189:1 13; EP-A-0 036 776.EP-A-0 136 829 and 
EP-A-0 136 907], Streptococcus cremoris [Powell etal (1988) Appl Environ. Microbiol 
54:655]; Streptococcus lividans [Powell etal (1988) Appl Environ. Microbiol 54:655], 
Streptomyces lividans [US patent 4,745,056]. 

Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and 
usually include either the transformation of bacteria treated with CaCU or other agents, such as 
divalent cations and DMSO. DNA can also be introduced into bacterial cells by electroporation. 
Transformation procedures usually vary with the bacterial species to be transformed. See eg. 
[Masson etal (1989) FEMS Microbiol Lett. 60:273; Palva etal (1982) Proc. Natl Acad. ScL 
USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541, Bacillus], [Miller etal 
(1988) Proc. Natl Acad. ScL 55:856; Wang etal (1990) J. Bacteriol 772:949, 
Campylobacter], [Cohen et al (1973) Proc. NatL Acad. ScL 69:21 10; Dower et al (1988) 
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Nucleic Acids Res. 75:6127; Kushner (1978) "An improved method for transformation of 
Escherichia coli with ColEl-derived plasmids. In Genetic Engineering: Proceedings of the 
International Symposium on Genetic Engineering (eds. H.W. Boyer and S. Nicosia); Mandel et 
al (1970) / MoL Biol 53:159; Taketo (1988) Biochim. Biophys. Acta 949:318; Escherichia], 
[Chassy etal (1987) FEMS Microbiol Lett. 44:173 Lactobacillus]; [Fiedler etal (1988) Anal 
Biochem 770:38, Pseudomonas]; [Augustin etal (1990) FEMS Microbiol Lett. 66:203, 
Staphylococcus], [Barany et al (1980) /. Bacteriol 744:698; Harlander (1987) "Transformation 
of Streptococcus lactis by electroporation, in: Streptococcal Genetics (ed. J. Ferretti and R. 
Curtiss m);Peiry etal (1981) Infect Immun. 52:1295; Powell etal (\9%%)Appl Environ. 
Microbiol 54:655; Somkuti etal (1987) Proc. 4th Evr. Cong. Biotechnology 7:412, 
Streptococcus], 

v. Yeast Expression 

Yeast expression systems are also known to one of ordinary skill in the art. A yeast promoter is 
any DNA sequence capable of binding yeast RNA polymerase and initiating the downstream 
(3') transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 
transcription initiation region which is usually placed proximal to the 5' end of the coding 
sequence. This transcription initiation region usually includes an RNA polymerase binding site 
(the "TATA Box") and a transcription initiation site. A yeast promoter may also have a second 
domain called an upstream activator sequence (UAS), which, if present, is usually distal to the 
structural gene. The UAS permits regulated (inducible) expression. Constitutive expression 
occurs in the absence of a UAS. Regulated expression may be either positive or negative, 
thereby either enhancing or reducing transcription. 

Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding 
enzymes in the metabolic pathway provide particularly useful promoter sequences. Examples 
include alcohol dehydrogenase (ADH) (EP-A-0 284 044), enolase, glucokinase, gIucose-6- 
phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), 
hexokinase, phosphofructokinase, 3-phosphogIycerate mutase, and pyruvate kinase (PyK) 
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(EPO-A-0 329 203). The yeast PH05 gene, encoding acid phosphatase, also provides useftil 
promoter sequences [Myanohara et at, (1983) Proc. Natl Acad. Sci USA 80:1]. 
In addition, synthetic promoters which do not occur in nature also function as yeast promoters. 
For example, UAS sequences of one yeast promoter may be joined with the transcription 
activation region of another yeast promoter, creating a synthetic hybrid promoter. Examples of 
such hybrid promoters include the ADH regulatory sequence linked to the GAP transcription 
activation region (US Patent Nos. 4,876,197 and 4,880,734). Other examples of hybrid 
promoters include promoters which consist of the regulatory sequences of either the ADH2, 
GALA, GALI0, OR PH05 genes, combined with the transcriptional activation region of a 
glycolytic enzyme gene such as GAP or PyK (EP-A-0 164 556). Furthermore, a yeast promoter 
can include naturally occurring promoters of non-yeast origin that have the ability to bind yeast 
RNA polymerase and initiate transcription. Examples of such promoters include, inter alia, 
[Cohen et al (1980) Proc. Natl Acad. Sci. USA 77:1078; Henikoff et al (1981) Nature 
283:635; Hollenberg et al. (1981) Curr. Topics Microbiol Immunol 96:1 19; Hollenberg et al 
(1979) "The Expression of Bacterial Antibiotic Resistance Genes in the Yeast Saccharomyces 
cerevisiae," in: Plasmids of Medical, Environmental and Commercial Importance (eds. K.N. 
Hmmis and A. Puhler); Mercerau-Puigalon etal (1980) Gene ii:163; Panthier et al (1980) 
Curr. Genet 2:109;]. 

A DNA molecule may be expressed intracellularly in yeast A promoter sequence may be 
directly linked with the DNA molecule, in which case the first amino acid at the N-terminus of 
the recombinant protein will always be a methionine, which is encoded by the ATG start codon. 
If desired, methionine at the N-terminus may be cleaved from the protein by in vitro incubation 
with cyanogen bromide. 

Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, 
baculovirus, and bacterial expression systems. Usually, a DNA sequence encoding the N- 
terminal portion of an endogenous yeast protein, or other stable protein, is fused to the 5' end of 
heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
amino acid sequences. For example, the yeast or human superoxide dismutase (SOD) gene, can 
be linked at the 5* terminus of a foreign gene and expressed in yeast. The DNA sequence at the 
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junction of the two amino acid sequences may or may not encode a cleavable site. See eg. EP- 
A-0 196 056. Another example is aubiquitin fusion protein. Such a fusion protein is made with 
the ubiquitin region that preferably retains a site for a processing enzyme {eg. ubiquitin-specific 
processing protease) to cleave the ubiquitin from the foreign protein. Through this method, 
therefore, native foreign protein can be isolated (eg. WO88/024066). 
Alternatively, foreign proteins can also be secreted from the cell into the growth media by 
creating chimeric DNA molecules that encode a fusion protein comprised of a leader sequence 
fragment that provide for secretion in yeast of the foreign protein. Preferably, there are 
processing sites encoded between the leader fragment and the foreign gene that can be cleaved 
either in vivo or in vitro. The leader sequence fragment usually encodes a signal peptide 
comprised of hydrophobic amino acids which direct the secretion of the protein from the cell. 
DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, 
such as the yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and the A-factor gene (US 
patent 4,588,684). Alternatively, leaders of non-yeast origin, such as an interferon leader, exist 
that also provide for secretion in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor 
gene, which contains both a "pre" signal sequence, and a "pro" region. The types of alpha-factor 
fragments that can be employed include the full-length pre-pro alpha factor leader (about 83 
amino acid residues) as well as truncated alpha-factor leaders (usually about 25 to about 50 
amino acid residues) (US Patents 4,546,083 and 4,870,008; EP-A-0 324 274). Additional 
leaders employing an alpha-factor leader fragment that provides for secretion include hybrid 
alpha-factor leaders made with a presequence of a first yeast, but a pro-region from a second 
yeast alphafactor. (eg. see WO 89/02463.) 

Usually, transcription termination sequences recognized by yeast are regulatory regions located 
3 1 to the translation stop codon, and thus together with the promoter flank the coding sequence. 
These sequences direct the transcription of an mRNA which can be translated into the 
polypeptide encoded by the DNA. Examples of transcription terminator sequence and other 
yeast-recognized termination sequences, such as those coding for glycolytic enzymes. 
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Usually, the above described components, comprising a promoter, leader (if desired), coding 
sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an 
extrachromosomal element (eg. plasmids) capable of stable maintenance in a host, such as yeast 
or bacteria. The replicon may have two replication systems, thus allowing it to be maintained, 
for example, in yeast for expression and in a prokaryotic host for cloning and amplification. 
Examples of such yeast-bacteria shuttle vectors include YEp24 [Botstein et al (1979) Gene 
5:17-24], pCl/1 [Brake etal (1984) Proc. Natl Acad Sci USA 87:4642-4646], and YRpl7 
[Stinchcomb et al. (1982) J. MoL Biol 755:157]. In addition, a replicon may be either a high or 
low copy number plasmid. A high copy number plasmid will generally have a copy number 
ranging from about 5 to about 200, and usually about 10 to about 150. A host containing a high 
copy number plasmid will preferably have at least about 10, and more preferably at least about 
20. Enter a high or low copy number vector may be selected, depending upon the effect of the 
vector and the foreign protein on the host See eg. Brake et al, supra. 
Alternatively, the expression constructs can be integrated into the yeast genome with an 
integrating vector. Integrating vectors usually contain at least one sequence homologous to a 
yeast chromosome that allows the vector to integrate, and preferably contain two homologous 
sequences flanking the expression construct. Integrations appear to result from recombinations 
between homologous DNA in the vector and the yeast chromosome [Orr-Weaver et al (1983) 
Methods in Enzymol 707:228-245]. An integrating vector may be directed to a specific locus in 
yeast by selecting the appropriate homologous sequence for inclusion in the vector. See Orr- 
Weaver et al> supra. One or more expression construct may integrate, possibly affecting levels 
of recombinant protein produced [Rine et al (1983) Proc. Natl Acad ScL USA 50:6750]. The 
chromosomal sequences included in the vector can occur either as a single segment in the 
vector, which results in the integration of the entire vector, or two segments homologous to 
adjacent segments in the chromosome and flanking the expression construct in the vector, 
which can result in the stable integration of only the expression construct 
Usually, extrachromosomal and integrating expression constructs may contain selectable 
markers to allow for the selection of yeast strains that have been transformed. Selectable 
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markers may include biosynthetic genes that can be expressed in the yeast host, such as ADE2, 
HIS4, LEU2, TRP1, and ALG7, and the G418 resistance gene, which confer resistance in yeast 
cells to tunicamycin and G418, respectively. In addition, a suitable selectable marker may also 
provide yeast with the ability to grow in the presence of toxic compounds, such as metal. For 
example, the presence of CUP1 allows yeast to grow in the presence of copper ions [Butt et al 
{mi) Microbiol Rev. 57:351]. 

Alternatively, some of the above described components can be put together into transformation 
vectors. Transformation vectors are usually comprised of a selectable marker that is either 
maintained in a replicon or developed into an integrating vector, as described above. 
Expression and transformation vectors, either extrachromosomal replicons or integrating 
vectors, have been developed for transformation into many yeasts. For example, expression 
vectors have been developed for, inter alia, the following yeasts : Candida albicans [Kurtz, et al. 
(1986) Mol Cell. Biol 5:142], Candida maltosa [Kunze, et al (1985) J. Basic Microbiol 
25:141]. Hansenula polymorpha [Gleeson, etal (1986) J. Gen. Microbiol 732:3459; 
Roggenkamp etal (1986) Mol Gen. Genet. 202:302], Kluyveromyces fragilis [Das, etal 
(1984) J. Bacteriol 158:1 165], Kluyveromyces lactis [De Louvencourt et al (1983) /. 
Bacteriol 754:737; Van den Berg etal (1990) Bio/Technology S;135], Pichia guillerimondii 
[Kunze et al. (1985) /. Basic Microbiol 25:141], Pichia pastoris [Cregg, et al. (1985) Mol Cell 
Biol 5:3376; US Patent Nos. 4,837,148 and 4,929,555], Saccharomyces cerevisiae [Hinnen et 
al (1978) Proc. Natl Acad. Sci. USA 75:1929; Ito etal (1983) J. Bacteriol 755:163], 
Schizosaccharomyces pombe [Beach and Nurse (1981) Nature 500:706], and Yarrowia 
lipolytica [Davidow, etal (1985) Curr. Genet. 70:380471 Gaillardin, etal (1985) Curr. Genet 
70:49]. 

Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usually 
include either the transformation of spheroplasts or of intact yeast cells treated with alkali 
cations. Transformation procedures usually vary with the yeast species to be transformed. See 
eg. [Kurtz etal (1986) Mol Cell Biol 5:142; Kunze era/. (1985)7. Basic Microbiol 25:141; 
Candida]; [Gleeson et al (1986) /. Gen Microbiol 752:3459; Roggenkamp et al (1986) Mol 
Gen. Genet. 202:302; Hansenula]; [Das et al (1984) J. Bacteriol 755:1 165; De Louvencourt et 
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al (1983) J. Bacteriol. 154:1 165; Van den Berg et al (1990) Bio/Technology 8:135; 
Kluyverorayces]; [Cregg et al (1985) Mol Cell Biol 5:3376; Kunze et al (1985) J. Basic 
Microbiol 25:141; US Patent Nos. 4,837,148 and 4,929,555; Pichia]; [Hinnen *r a/. (1978) 
Proa Natl Acad. ScL USA 75;1929; Ito etal (1983) / Bacteriol 153:163 Saccharomyces]; 
[Beach and Nurse (1981) Nature 300:706; Schizosaccharomyces]; [Davidow etal (1985) Curr. 
Genet 70:39; Gaillardin etal (1985) Curr. Genet 70:49; Yarrowia]. 

As used herein, the term "antibody" refers to a polypeptide or group of polypeptides composed 
of at least one antibody combining site. An "antibody combining site" is the three-dimensional 
binding space with an internal surface shape and charge distribution complementary to the 
features of an epitope of an antigen, which allows a binding of the antibody with the antigen. 
"Antibody" includes, for example, vertebrate antibodies, hybrid antibodies, chimeric antibodies, 
humanised antibodies, altered antibodies, univalent antibodies, Fab proteins, and single domain 
antibodies. 

» Antibodies against the proteins of the invention are useful for affinity chromatography, 

immunoassays, and distinguishing/identifying Streptococcal proteins. 
Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared 
by conventional methods. In general, the protein is first used to immunize a suitable animal, 
preferably a mouse, rat, rabbit or goat Rabbits and goats are preferred for the preparation of 
polyclonal sera due to the volume of serum obtainable, and the availability of labeled anti-rabbit 
and anti-goat antibodies. Immunization is generally performed by mixing or emulsifying the 
protein in saline, preferably in an adjuvant such as Freund's complete adjuvant, and injecting 
the mixture or emulsion parenterally (generally subcutaneously or intramuscularly). A dose of 
50-200 jig/injection is typically sufficient Immunization is generally boosted 2-6 weeks later 
with one or more injections of the protein in saline, preferably using Freund's incomplete 
adjuvant. One may alternatively generate antibodies by in vitro immunization using methods 
known in the art, which for the purposes of this invention is considered equivalent to in vivo 
immunization. Polyclonal antisera is obtained by bleeding the immunized animal into a glass or 
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plastic container, incubating the blood at 25DC for one hour, followed by incubating at 4C3C for 
2-18 hours. The serum is recovered by centrifugation (eg. l,000g for 10 minutes). About 20-50 
ml per bleed may be obtained from rabbits. 

Monoclonal antibodies are prepared using the standard method of Kohler & Milstein [Nature 
(1975) 256:495-96], or a modification thereof. Typically, a mouse or rat is immunized as 
described above. However, rather than bleeding the animal to extract serum, the spleen (and 
optionally several large lymph nodes) is removed and dissociated into single cells. If desired, 
the spleen cells may be screened (after removal of nonspecifically adherent cells) by applying a 
cell suspension to a plate or well coated with the protein antigen. B-cells expressing 
membrane-bound immunoglobulin specific for the antigen bind to the plate, and are not rinsed 
away with the rest of the suspension. Resulting B-cells, or all dissociated spleen cells, are then 
induced to fuse with myeloma cells to form hybridomas, and are cultured in a selective medium 
(eg. hypoxan thine, aminopterin, thymidine medium, "HAT'). The resulting hybridomas are 
plated by limiting dilution, and are assayed for production of antibodies which bind specifically 
to the immunizing antigen (and which do not bind to unrelated antigens). The selected 
MAb-secreting hybridomas are then cultured either in vitro (eg. in tissue culture bottles or 
hollow fiber reactors), or in vivo (as ascites in mice). 

If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using 
conventional techniques. Suitable labels include fluorophores, chromophores, radioactive atoms 
(particularly ^P and ,25 I), electron-dense reagents, enzymes, and ligands having specific 
binding partners. Enzymes are typically detected by their activity. For example, horseradish 
peroxidase is usually detected by its ability to convert 3,3 ',5,5 -tetramethylbenzidine (TMB) to a 
blue pigment, quantifiable with a spectrophotometer. "Specific binding partner" refers to a 
protein capable of binding a ligand molecule with high specificity, as for example in the case of 
an antigen and a monoclonal antibody specific therefor. Other specific binding partners include 
biotin and avidin or streptavidin, IgG and protein A, and the numerous receptor-ligand couples 
known in the art It should be understood that the above description is not meant to categorize 
the various labels into distinct classes, as the same label may serve in several different modes. 
For example, m I may serve as a radioactive label or as an electron-dense reagent. HRP may 
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serve as enzyme or as antigen for a MAb. Further, one may combine various labels for desired 
effect. For example, MAbs and avidin also require labels in the practice of this invention: thus, 
one might label a MAb with biotin, and detect its presence with avidin labeled with l25 I, or with 
an anti-biotin MAb labeled with HRP. Other permutations and possibilities will be readily 
apparent to those of ordinary skill in the art, and are considered as equivalents within the scope 
of the instant invention. 

Pharmaceutical Compositions 

Pharmaceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of 
the invention. Hie pharmaceutical compositions will comprise a therapeutically effective 
amount of either polypeptides, antibodies, or polynucleotides of the claimed invention. 
The term "therapeutically effective amount" as used herein refers to an amount of a therapeutic 
agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a detectable 
therapeutic or preventative effect. The effect can be detected by, for example, chemical markers 
or antigen levels. Therapeutic effects also include reduction in physical symptoms, such as 
decreased body temperature. The precise effective amount for a subject will depend upon the 
subjects size and health, die nature and extent of the condition, and the therapeutics or 
combination of therapeutics selected for administration. Thus, it is not useful to specify an exact 
effective amount in advance. However, the effective amount for a given situation can be 
determined by routine experimentation and is within the judgement of the clinician. 
For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 
mgflcg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is 
administered. 

A pharmaceutical composition can also contain a pharmaceutical^ acceptable carrier. Hie term 
"pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic 
agent, such as antibodies or a polypeptide, genes, and other therapeutic agents. The term refers 
to any pharmaceutical carrier that does not itself induce the production of antibodies harmful to 
the individual receiving the composition, and which may be administered without undue 
toxicity. Suitable carriers may be large, slowly metabolized macromolecules such as proteins, 
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polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid 
copolymers, and inactive virus particles. Such carriers are well known to those of ordinary skill 
intheart 

Pharmaceutical^ acceptable salts can be used therein, for example, mineral acid salts such as 
hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids 
such as acetates, propionates, malonates, benzoates, and the like. A thorough discussion of 
pharmaceutical^ acceptable excipients is available in Remington's Pharmaceutical Sciences 
(Mack Pub. Co., NJ. 1991). 

Pharmaceutical^ acceptable carriers in therapeutic compositions may contain liquids such as 
water, saline, glycerol and ethanol. Additionally, auxiliary substances, such as wetting or 
emulsifying agents, pH buffering substances, and the like, may be present in such vehicles. 
Typically, the therapeutic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to 
injection may also be prepared. Liposomes are included within the definition of a 
pharmaceutically acceptable carrier. 

Delivery Methods 

Once formulated, the compositions of the invention can be administered directly to the subject 
The subjects to be treated can be animals; in particular, human subjects can be treated. 
Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the 
interstitial space of a tissue. The compositions can also be administered into a lesion. Other 
modes of administration include oral and pulmonary administration, suppositories, and 
transdermal or transcutaneous applications (eg. see WO98/20734), needles, and gene guns or 
hyposprays. Dosage treatment may be a single dose schedule or a multiple dose schedule. 
See also Delivery Strategies for Antisense Oligonucleotide Therapeutics (ed. Akhtar) ISBN 
0849347785. 
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Vaccines 

Vaccines according to the invention may either be prophylactic (ie. to prevent infection) or 
therapeutic (ie. to treat disease after infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), polypeptide(s), protein(s) or 
nucleic acid, usually in combination with **pharmaceutically acceptable carriers," which include 
any carrier that does not itself induce the production of antibodies harmful to the individual 
receiving the composition. Suitable carriers are typically large, slowly metabolized 
macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, poly- 
meric amino acids, amino acid copolymers, lipid aggregates (such as oil droplets or liposomes), 
and inactive virus particles. Such carriers are well known to those of ordinary skill in the art 
Additionally, these carriers may function as immunostimulating agents ("adjuvants"). 
Furthermore, the antigen or immunogen may be conjugated to a bacterial toxoid, such as a 
toxoid from diphtheria, tetanus, cholera, H. pylori, etc. pathogens. 

Preferred adjuvants to enhance effectiveness of the composition include, but are not limited to: 
(1) aluminum salts (alum), such as aluminum hydroxide, aluminum phosphate, aluminum 
sulfate, etc; (2) oil-in-water emulsion formulations (with or without other specific 
immunostimulating agents such as muramyl peptides (see below) or bacterial cell wall 
components), such as for example (a) MF59™ (WO 90/14837; Chapter 10 in Vaccine design: 
the subunit and adjuvant approach, eds. Powell & Newman, Plenum Press 1995), containing 
5% Squalene, 0.5% Tween 80, and 0 5% Span 85 (optionally containing various amounts of 
MTP-PE (see below), although not required) formulated into submicron particles using a 
microfluidizer such as Model HOY microfluidizer (Microfluidics, Newton, MA), (b) SAF, 
containing 10% Squalane, 0.4% Tween 80, 5% pluronic-blocked polymer L121, and thr-MDP 
(see below) either microfluidized into a submicron emulsion or vortexed to generate a larger 
particle size emulsion, and (c) Ribi™ adjuvant system (RAS), (Ribi Immunochem, Hamilton, 
MT) containing 2% Squalene, 0.2% Tween 80, and one or more bacterial cell wall components 
from the group consisting of monophosphorylipid A (MPL), trehalose dimycolate (TDM), and 
cell wall skeleton (CWS), preferably MPL + CWS (Detox™); (3) saponin adjuvants, such as 
Stimulon™ (Cambridge Bioscience, Worcester, MA) may be used or particles generated 
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therefrom such as ISCOMs (immunostimulating complexes); (4) Complete Freund's Adjuvant 
(CPA) and Incomplete Freund's Adjuvant (FA); (5) cytokines, such as interleukins {eg. IL-1, 
IL-2, IL-4, IL-5, IL-6, IL-7, DL-12, etc.), interferons {eg. gamma interferon), macrophage 
colony stimulating factor (M-CSF), tumor necrosis factor (TNF), etc; and (6) other substances 
that act as immunostimulating agents to enhance the effectiveness of the composition. Alum 
and MF59™ are preferred. 

As mentioned above, muramyl peptides include, but are not limited to, N-acetyl-muramyl-L- 
threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanyl-D-isoglutamine (nor- 
MDP),N-acetylmuramyl-L-alanyl-i>isoglutm^ 
hydroxyphosphoryloxy)-ethylamine (MTP-PE), etc. 

The immunogenic compositions {eg. the immunising antigen/immunogen/polypeptide/protein/ 
nucleic acid, pharmaceutically acceptable carrier, and adjuvant) typically will contain diluents, 
such as water, saline, glycerol, ethanol, etc. Additionally, auxiliary substances, such as wetting 
or emulsifying agents, pH buffering substances, and the like, may be present in such vehicles. 
Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions 
or suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to 
injection may also be prepared. The preparation also may be emulsified or encapsulated in 
liposomes for enhanced adjuvant effect, as discussed above under pharmaceutically acceptable 
carriers. 

Immunogenic compositions used as vaccines comprise an immunologically effective amount of 
the antigenic or immunogenic polypeptides, as well as any other of the above-mentioned 
components, as needed. By "immunologically effective amounf \ it is meant that the adminis- 
tration of that amount to an individual, either in a single dose or as part of a series, is effective 
for treatment or prevention. This amount varies depending upon the health and physical 
condition of the individual to be treated, the taxonomic group of individual to be treated {eg. 
nonhufaan primate, primate, etc), the capacity of the individual's immune system to synthesize 
antibodies, the degree of protection desired, the formulation of the vaccine, the treating doctor's 
assessment of the medical situation, and other relevant factors. It is expected that the amount 
will fall in a relatively broad range that can be determined through routine trials. 
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The immunogenic compositions are conventionally administered parenterally, eg. by injection, 
either subcutaneously, intramuscularly, or transdennally/transcutaneously (eg. WO98/20734). 
Additional formulations suitable for other modes of administration include oral and pulmonary 
formulations, suppositories, and transdermal applications. Dosage treatment may be a single 
dose schedule or a multiple dose schedule. The vaccine may be administered in conjunction 
with other immunoregulatory agents. 

As an alternative to protein-based vaccines, DNA vaccination may be used [eg. Robinson & 
Tones (1997) Seminars in Immunol 9:271-283; Donnelly et al (1997) Anna Rev Immunol 
15:617-648; later herein]. 

Gene Delivery Vehicles 

Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of 
the invention, to be delivered to the mammal for expression in the mammal, can be 
administered either locally or systemically. These constructs can utilize viral or non-viral vector 
approaches in in vivo or ex vivo modality. Expression of such coding sequence can be induced 
using endogenous mammalian or heterologous promoters. Expression of the coding sequence in 
vivo can be either constitutive or regulated. 

The invention includes gene delivery vehicles capable of expressing the contemplated nucleic 
acid sequences. The gene delivery vehicle is preferably a viral vector and, more preferably, a 
retroviral, adenoviral, adeno-associated viral (AAV), herpes viral, or alphavirus vector. The 
viral vector can also be an astrovinis, coronavirus, orthomyxovirus, papovavirus, 
paramyxovirus, parvovirus, picornavirus, poxvirus, or togavirus viral vector. See generally, 
Jolly (1994) Cancer Gene Therapy 1 :51-64; Kimura (1994) Human Gene Therapy 5:845-852; 
Connelly (1995) Human Gene Therapy 6:185-193; and Kaplitt (1994) Nature Genetics 
6:148-153. 

Retroviral vectors are well known in the art and we contemplate that any retroviral gene therapy 
vector is employable in the invention, including B, C and D type retroviruses, xenotropic 
retroviruses (for example, NZB-X1, NZB-X2 and NZB9-1 (see O'Neill (1985) /. Virol 53:160) 
polytropic retroviruses eg. MCF and MCF-MLV (see Kelly (1983) J. Virol 45:291), 
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spumavirases and lentiviruses. See RNA Tumor Viruses, Second Edition, Cold Spring Harbor 
Laboratory, 1985. 

Portions of the retroviral gene therapy vector may be derived from different retroviruses. For 
example, retrovector LTRs may be derived from a Murine Sarcoma Virus, a (RNA binding site 
from a Rous Sarcoma Virus, a packaging signal from a Murine Leukemia Virus, and an origin 
of second strand synthesis from an Avian Leukosis Virus. 

These recombinant retroviral vectors may be used to generate transduction competent retroviral 
vector particles by introducing them into appropriate packaging cell lines (see US patent 
5,591,624). Retrovirus vectors can be constructed for site-specific integration into host cell 
DNA by incorporation of a chimeric integrase enzyme into the retroviral particle (see 
W096/37626). It is preferable that the recombinant viral vector is a replication defective 
recombinant virus. 

Packaging cell lines suitable for use with the above-described retrovirus vectors are well known 
in (he art, are readily prepared (see WO95/30763 and WO92/05266), and can be used to create 
producer cell lines (also termed vector cell lines or "VCLs") for the production of recombinant 
vector particles. Preferably, the packaging cell lines are made from human parent cells (eg. 
HT1080 cells) or mink parent cell lines, which eliminates inactivation in human serum. 
Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian 
Leukosis Virus, Bovine Leukemia, Virus, Murine Leukemia Virus, Mink-Cell Focus-Inducing 
Virus, Murine Sarcoma Virus, Reticuloendotheliosis Virus and Rous Sarcoma Virus. 
Particularly preferred Murine Leukemia Viruses include 4070A and 1504A (Hartley and Rowe 
(1976) / Virol 19:19-25), Abelson (ATCC No. VR-999), Friend (ATCC No. VR-245), Graffi, 
Gross (ATCC Nol VR-590), Kirsten, Harvey Sarcoma Virus and Rauscher (ATCC No. 
VR-998) and Moloney Murine Leukemia Virus (ATCC No. VR-190). Such retroviruses may be 
obtained from depositories or collections such as the American Type Culture Collection 
("ATCC*) in Rockville, Maryland or isolated from known sources using commonly available 
techniques. 

Exemplary known retroviral gene therapy vectors employable in this invention include those 
described in patent applications GB2200651, EP0415731, EP0345242, EP0334301, 
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WO89/02468; WO89/05349, WO89/09271, WO90/02806, WO90/07936, WO94/03622, 
W093/25698, W093/25234, W093/1 1230, WO93/10218, WO91/02805, WO91/02825, 
WO95/07994, US 5,219,740, US 4,405,712, US 4,861,719, US 4,980,289, US 4,777,127, US 
5,591,624. See also Vile (1993) Cancer Res 53:3860-3864; Vile (1993) Cancer Res 
53:962-967; Ram (1993) Cancer Res 53 (1993) 83-88; Takamiya (1992) JNeurosci Res 
33:493-503; Baba (1993) JNeurosurg 79:729-735; Mann (1983) Cell 33:153; Cane (1984) 
Proc Nail Acad Sci 81:6349; and Miller (1990) Human Gene Therapy 1. 
Human adenoviral gene therapy vectors are also known in the art and employable in this 
invention. See, for example, Berkner (1988) Biotechniques 6:616 and Rosenfeld (1991) Science 
252:431, and WO93/07283, WO93/06223, and WO93/07282. Exemplary known adenoviral 
gene therapy vectors employable in this invention include those described in the above 
referenced documents and in W094/12649, WO93/03769, W093/19191, W094/28938, 
W095/1 1984, WO95/00655, WO95/27071 , W095/29993, W095/34671 , WO96/05320, 
WO94/08026, WO94/11506, WO93/06223, W094/24299, WO95/14102, W095/24297, 
WO95/02697, W094/28152, W094/24299, WO95/09241, WO95/25807, WO95/05835, 
W094/18922 and WO95/09654. Alternatively, administration of DNA linked to killed 
adenovirus as described in Curiel (1992) Hum. Gene Then 3:147-154 may be employed. The 
gene delivery vehicles of the invention also include adenovirus associated virus (AAV) vectors. 
Leading and preferred examples of such vectors for use in this invention are the AAV-2 based 
vectors disclosed in Srivastava, WO93/09239. Most preferred AAV vectors comprise the two 
AAV inverted terminal repeats in which the native D-sequences are modified by substitution of 
nucleotides, such that at least 5 native nucleotides and up to 18 native nucleotides, preferably at 
least 10 native nucleotides up to 18 native nucleotides, most preferably 10 native nucleotides 
are retained and the remaining nucleotides of the D-sequence are deleted or replaced with 
non-native nucleotides. The native D-sequences of the AAV inverted terminal repeats are 
sequences of 20 consecutive nucleotides in each AAV inverted terminal repeat (ie. there is one 
sequence at each end) which are not involved in HP formation. The non-native replacement 
nucleotide may be any nucleotide other than the nucleotide found in the native D-sequence in 
the same position. Other employable exemplary AAV vectors are pWP-19, pWN-1, both of 
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which are disclosed in Nahreini (1993) Gene 124:257-262. Another example of such an AAV 
vector is psub201 (see Samulski (1987) 7. Virol 613096). Another exemplary AAV vector is 
the Double-D ITR vector. Construction of the Double-D ITR vector is disclosed in US Patent 
5,478,745. Still other vectors are those disclosed in Carter US Patent 4,797,368 and Muzyczka 
US Patent 5,139,941, Chartejee US Patent 5,474,935, and Kotin W094/288157. Yet a further 
example of an AAV vector employable in this invention is SS V9AFABTKneo, which contains 
the AFP enhancer and albumin promoter and directs expression predominantly in the liver. Its 
structure and construction are disclosed in Su (1996) Human Gene Therapy 7:463-470. 
Additional AAV gene therapy vectors are described in US 5,354,678, US 5,173,414, US 
5,139,941, and US 5,252,479. 

The gene therapy vectors of the invention also include herpes vectors. Leading and preferred 
examples are herpes simplex virus vectors containing a sequence encoding a thymidine kinase 
polypeptide such as those disclosed in US 5,288,641 and EP0176170 (Roizman). Additional 
exemplary herpes simplex virus vectors include HFEM/ICP6-LacZ disclosed in WO95/04139 
(Wistar Institute), pHSVlac described in Geller (1988) Science 241:1667-1669 and in 
WO90/09441 and WO92/07945, HSV Us3::pgC-lacZ described in Fink (1992) Human Gene 
Therapy 3:1 1-19 and HSV 7134, 2 RH 105 and GAL4 described in EP 0453242 (Breakefield), 
and those deposited with the ATCC with accession numbers VR-977 and VR-260. 
Also contemplated are alpha virus gene therapy vectors that can be employed in this invention. 
Preferred alpha virus vectors are Sindbis viruses vectors. Togaviruses, Semliki Forest virus 
(ATCC VR-67; ATCC VR-1247), Middleberg virus (ATCC VR-370), Ross River virus (ATCC 
VR-373; ATCC VR-1246), Venezuelan equine encephalitis virus (ATCC VR923; ATCC 
VR-1250; ATCC VR-1249; ATCC VR-532), and those described in US patents 5,091,309, 
5,217,879, and WO92/10578. More particularly, those alpha virus vectors described in US 
Serial No. 08/405,627, filed March 15, 1995,W094/21792, WO92/10578, WO95/07994, US 
5,091,309 and US 5,217,879 are employable. Such alpha viruses may be obtained from 
depositories or collections such as the ATCC in Rockville, Maryland or isolated from known 
sources using commonly available techniques. Preferably, alphavirus vectors with reduced 
cytotoxicity are used (see USSN 08/679640). 
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DNA vector systems such as eukaryotic layered expression systems are also useful for 
expressing the nucleic acids of the invention. See WO95/07994 for a detailed description of 
eukaryotic layered expression systems. Preferably, the eukaiyotic layered expression systems of 
the invention are derived from alphavirus vectors and most preferably from Sindbis viral 
vectors. 

Other viral vectors suitable for use in the present invention include those derived from 
poliovirus, for example ATCC VR-58 and those described in Evans, Nature 339 (1989) 385 and 
Sabin (1973) J. Biol Standardization 1:115; rhinovirus, for example ATCC VR-1 1 10 and those 
described in Arnold (1990) / Cell Biochem L401 ; pox viruses such as canary pox virus or 
vaccinia virus, for example ATCC VR-1 1 1 and ATCC VR-2010 and those described in 
Fisher-Hoch (1989) Proc Natl Acad Sci 86:317; Hexner (1989) Ann NYAcadSci 569:86, 
Flexner (1990) Vaccine 8:17; in US 4,603,112 and US 4,769,330 and WO89/01973; SV40 
virus, for example ATCC VR-305 and those described in Mulligan (1979) Nature 277:108 and 
Madzak (1992) / Gen Virol 73:1533; influenza virus, for example ATCC VR-797 and 
recombinant influenza viruses made employing reverse genetics techniques as described in US 
5,166,057 and in Enami (1990) Proc Natl Acad Sci 87:3802-3805; Enami & Palese (1991) J 
Virol 65:271 1-2713 and Luytjes (1989) Cell 59:1 10, (see also McMichael (1983) NEJMed 
309:13, and Yap (1978) Nature 273:238 and Nature (1979) 277:108); human 
immunodeficiency virus as described in EP-0386882 and in Buchschacher (1992) /. Virol. 
66:2731; measles virus, for example ATCC VR-67 and VR-1 247 and those described in EP- 
0440219; Aura virus, for example ATCC VR-368; Bebaru virus, for example ATCC VR-600 
and ATCC VR-1 240; Cabassou virus, for example ATCC VR-922; Chikungunya virus, for 
example ATCC VR-64 and ATCC VR-1241; Fort Morgan Virus, for example ATCC VR-924; 
Getah virus, for example ATCC VR-369 and ATCC VR-1243; Kyzylagach virus, for example 
ATCC VR-927; Mayaro virus, for example ATCC VR-66; Mucambo virus, for example ATCC 
VR-580 and ATCC VR-1 244; Ndumu virus, for example ATCC VR-371; Pixuna virus, for 
example ATCC VR-372 and ATCC VR-1245; Tonate virus, for example ATCC VR-925; 
Triniti virus, for example ATCC VR-469; Una virus, for example ATCC VR-374; Whataroa 
virus, for example ATCC VR-926; Y-62-33 virus, for example ATCC VR-375; O'Nyong virus, 
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Eastern encephalitis virus, for example ATCC VR-65 and ATCC VR-1242; Western 
encephalitis virus, for example ATCC VR-70, ATCC VR-1251, ATCC VR-622 and ATCC 
VR-1252; and coronavirus, for example ATCC VR-740 and those described in Hamre (1966) 
Proc Soc Exp Biol Med 121:190. 

Delivery of the compositions of this invention into cells is not limited to the above mentioned 
viral vectors. Other delivery methods and media may be employed such as, for example, nucleic 
acid expression vectors, polycationic condensed DNA linked or unlinked to killed adenovirus 
alone, for example see US Serial No. 08/366,787, filed December 30, 1994 and Curiel (1992) 
Hum Gene Ther 3:147-154 ligand linked DNA, for example see Wu (1989) J Biol Chem 
264:16985-16987, eucaryotic cell delivery vehicles cells, for example see US Serial 
No.08/240,030, filed May 9, 1994, and US Serial No. 08/404,796, deposition of 
photopolymerized hydrogel materials, hand-held gene transfer particle gun, as described in US 
Patent 5,149,655, ionizing radiation as described in US5,206,152 and in W092/1 1033, nucleic 
charge neutralization or fusion with cell membranes. Additional approaches are described in 
Philip (1994) Mol Cell Biol 14:241 1-241 8 and in Woffendin (1994) Proc Natl Acad Sci 
91:1581-1585. 

Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. 
Briefly, the sequence can be inserted into conventional vectors that contain conventional control 
sequences for high level expression, and then incubated with synthetic gene transfer molecules 
such as polymeric DN A-binding cations like polylysine, protamine, and albumin, linked to cell 
targeting ligands such as asialoorosomucoid, as described in Wu & Wu (1987) 7. Biol Chem. 
262:4429-4432, insulin as described in Hucked (1990) Biochem Pharmacol 40:253-263, 
galactose as described in Plank (1992) Bioconjugate Chem 3:533-539, lactose or transferrin. 
Naked DNA may also be employed. Exemplary naked DNA introduction methods are described 
in WO 90/1 1092 and US 5,580,859. Uptake efficiency may be improved using biodegradable 
latex beads. DNA coated latex beads are efficiently transported into cells after endocy tosis 
initiation by the beads. The method may be improved further by treatment of the beads to 
increase hydrophobicity and thereby facilitate disruption of the endosome and release of the 
DNA into the cytoplasm. 
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Liposomes that can act as gene delivery vehicles are described in US 5,422,120, W095/13796, 
W094/23697, W091/14445 and EP-524,968. As described in USSN. 60/023,867, on non-viral 
delivery, the nucleic acid sequences encoding a polypeptide can be inserted into conventional 
vectors that contain conventional control sequences for high level expression, and then be 
incubated with synthetic gene transfer molecules such as polymeric DNA-binding cations like 
polylysine, protamine, and albumin, linked to cell targeting ligands such as asialoorosomucoid, 
insulin, galactose, lactose, or transferrin. Other delivery systems include the use of liposomes to 
encapsulate DNA comprising the gene under the control of a variety of tissue-specific or 
ubiquitously-active promoters. Further non-viral delivery suitable for use includes mechanical 
delivery systems such as the approach described in Woffendin et at (1994) Proc. Natl Acad 
ScL USA 91(24):1 1581-1 1585. Moreover, the coding sequence and the product of expression of 
such can be delivered through deposition of photopolymerized hydrogel materials. Other 
conventional methods for gene delivery that can be used for delivery of the coding sequence 
include, for example, use of hand-held gene transfer particle gun, as described in US 5,149,655; 
use of ionizing radiation for activating transferred gene, as described in US 5,206,152 and 
WO92/11033 

Exemplary liposome and polycationic gene delivery vehicles are those described in US 
5,422,120 and 4,762,915; in WO 95/13796; W094/23697; and W091/14445; in EP-0524968; 
and in StiyerJBiochemistry, pages 236-240 (1975) W.HL Freeman, San Francisco; Szoka * 
(1980) Biochem Biophys Acta 600:1; Bayer (1979) Biochem BiophysActa 550:464; Rivnay 
(1987) Meth Enzymol 149:119; Wang (1987) Proc Natl Acad Sci 84:7851; Plant (1989) Anal 
Biochem 176:420. 

A polynucleotide composition can comprises therapeutically effective amount of a gene therapy 
vehicle, as the term is defined above. For purposes of the present invention, an effective dose 
will be from about 0.01 mg/ kg to 50 mgflcg or 0.05 mgfleg to about 10 mg/kg of the DNA 
constructs in the individual to which it is administered. 
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Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be administered (1) 
directly to the subject; (2) delivered ex vivo, to cells derived from the subject; or (3) in vitro for 
expression of recombinant proteins. The subjects to be treated can be mammals or birds. Also, 
human subjects can be treated. 

Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the 
interstitial space of a tissue. The compositions can also be administered into a lesion. Other 
modes of administration include oral and pulmonary administration, suppositories, and 
transdermal or transcutaneous applications (eg. see WO98/20734), needles, and gene guns or 
hyposprays. Dosage treatment may be a single dose schedule or a multiple dose schedule. 
Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are 
known in the art and described in eg. W093/14778. Examples of cells useful in ex vivo 
applications include, for example, stem cells, particularly hematopoetic, lymph cells, 
macrophages, dendritic cells, or tumor cells. 

Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be 
accomplished by the following procedures, for example, dextran-mediated transfection, calcium 
phosphate precipitation, polybrene mediated transfection, protoplast fusion, electroporation, 
encapsulation of the polynucleotide^) in liposomes, and direct microinjection of the DNA into 
nuclei, all well known in the art 

Polynucleotide and polypeptide pharmaceutical compositions 

The terms "polynucleotide" and "nucleic acid", used interchangeably herein, 

In addition to the pharmaceutical^ acceptable carriers and salts described above, the following 

additional agents can be used with polynucleotide and/or polypeptide compositions. 

A-Polvpeptides 

One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); 
transferrin; asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; 
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interferons, granulocyte, macrophage colony stimulating factor (GM-CSF), granulocyte colony 
stimulating factor (G-CSF), macrophage colony stimulating factor (M-CSF), stem cell factor 
and erythropoietin. Viral antigens, such as envelope proteins, can also be used. Also, proteins 
from other invasive organisms, such as the 17 amino acid peptide from the circumsporozoite 
protein of Plasmodium falciparum known as RH. 

B.Hormones, Vitamins, eta 

Other groups that can be included are, for example: hormones, steroids, androgens, estrogens, 
thyroid hormone, or vitamins, folic acid. 

C .Polvalkvlenes. Polysaccharides, etc. 

Also, poly alky lene glycol can be included with the desired polynucleotides/polypeptides. In a 
preferred embodiment, the polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, 
or polysaccharides can be included. In a preferred embodiment of this aspect, the 
polysaccharide is dextran or DEAE-dextran. Also, chitosan andpoly(lactide-co-glycolide) 

D Lipids, and Liposomes 

Hie desired polynucleotide/polypeptide can also be encapsulated in lipids or packaged in 
liposomes prior to delivery to the subject or to cells derived therefrom. 
Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or 
entrap and retain nucleic acid. The ratio of condensed polynucleotide to lipid preparation can 
vary but will generally be around 1:1 (mg DNA:micromoles lipid), or more of lipid. For a 
review of the use of liposomes as carriers for delivery of nucleic acids, see, Hug and Sleight 
(1991) Biochim. Biophys. Acta. 1097:1-17; Straubinger (1983) Meth. EnzymoL 101:512-527. 
Liposomal preparations for use in the present invention include cationic (positively charged), 
anionic (negatively charged) and neutral preparations. Cationic liposomes have been shown to 



mediate intracellular delivery of plasmid DNA (Feigner (1987) Proc. Natl Acad Sci USA 
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purified transcription factors (Debs (1990) /. Biol Chem. 265:10189-10192), in functional 
form. 

Cationic liposomes are readily available. For example, 

N[l-23-dioleyloxy)propyl]-N,N,N-triethylammonium (DOTMA) liposomes are available 
under the trademark Lipofectin, from GIBCO BRL, Grand Island, NY. (See, also, Feigner 
supra). Other commercially available liposomes include transfectace (DDAB/DOPE) and 
DOTAP/DOPE (Boerhinger). Other cationic liposomes can be prepared from readily available 
materials using techniques well known in the art See, eg. Szoka (1978) Proc. NatL Acad. ScL 
USA 75:4194-4198; WO90/1 1092 for a description of the synthesis of DOTAP 
(l,2-bis(oleoyloxy>3-(trimethylammonio)propane) liposomes. 

Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids 
(Birmingham, AL), or can be easily prepared using readily available materials. Such materials 
include phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl 
choline (DOPC), dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine 
(DOPE), among others. These materials can also be mixed with the DOTMA and DOTAP 
starting materials in appropriate ratios. Methods for making liposomes using these materials are 
well known in the art. 

The liposomes can comprise multilammelar vesicles (MLVs), small unilamellar vesicles 
(SUVs), or large unilamellar vesicles (LUVs). The various liposome-nucleic acid complexes are 
prepared using methods known in the art See eg. Straubinger (1983) Meth. Immunol 
101:512-527; Szoka (1978) Proc. Natl Acad. ScL USA 75:4194-4198; Papahadjopoulos (1975) 
Biochinu Biophys. Acta 394:483; Wilson (1979) Cell 17:77); Deamer & Bangham (1976) 
Biochinu Biophys. Acta 443:629; Ostro (1977) Biochenu Biophys. Res. Commun. 76:836; Fraley 
(1979) Proc NatL Acad. ScL USA 76:3348); Enoch & Strittmatter (1979) Proc Natl Acad. ScL 
USA 76:145; Fraley (1980) J. Biol Chem. (1980) 255:10431; Szoka & Papahadjopoulos (1978) 
Proc. Natl Acad. ScL USA 75:145; and Schaefer-Ridder (1982) Science 215:166. 
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E.Lipoproteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. 
Examples of lipoproteins to be utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. 
Mutants, fragments, or fusions of these proteins can also be used. Also, modifications of. 
naturally occurring lipoproteins can be used, such as acetylated LDL. These lipoproteins can 
target the delivery of polynucleotides to cells expressing lipoprotein receptors. Preferably, if 
lipoproteins are including with the polynucleotide to be delivered, no other targeting ligand is 
included in the composition. 

Naturally occurring lipoproteins comprise a lipid and a protein portion. The protein portion are 
known as apoproteins. At the present, apoproteins A, B, C, D, and E have been isolated and 
identified. At least two of these contain several proteins, designated by Roman numerals, AI, 
AH, AIV; CI, CH, OIL 

A lipoprotein can comprise more than one apoprotein. For example, naturally occurring 
chylomicrons comprises of A, 6, C & E, over time these lipoproteins lose A and acquire C & E. 



VLDL comprises A, B, C & E apoproteins, LDL comprises apoprotein B; and HDL comprises 
apoproteins A, C, & E. 

The amino acid of these apoproteins are known and are described in, for example, Breslow 

(1985) Annu Rev. Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J 
Biol Chem 261 :12918; Kane (1980) Proc Natl Acad Sci USA 77:2465; and Utermann (1984) 
Hum Genet 65:232. 

Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), 
and phospholipids. The composition of the lipids varies in naturally occurring lipoproteins. For 
example, chylomicrons comprise mainly triglycerides. A more detailed description of the lipid 
content of naturally occurring lipoproteins can be found, for example, in Metk EnzymoL 128 

(1986) . The composition of the lipids are chosen to aid in conformation of the apoprotein for 
receptor binding activity. The composition of lipids can also be chosen to facilitate hydrophobic 
interaction and association with the polynucleotide binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by ultracentrifugation, for instance. 
Such methods are described in Metk EnzymoL (supra); Pitas (1980) /. Biochem. 
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255:5454-5460 and Mahey (1979) J Clitu Invest 64:743-750. Lipoproteins can also be produced 
by in vitro or recombinant methods by expression of the apoprotein genes in a desired host cell. 
See, for example, Atkinson (1986) Annu Rev Biophys Chem 15:403 and Radding (1958) 
Biochim Biophys Acta 30: 443. Lipoproteins can also be purchased from commercial suppliers, 
such as Biomedical Technologies, Inc., Stoughton, Massachusetts, USA. Further description of 
lipoproteins can be found in Zuckennann etal PCT/US97/14465. 
RPolycationic Agents 

Polycationic agents can be included, with or without lipoprotein, in a composition with the 
desired polynucleotide/polypeptide to be delivered. 

Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are 
capable of neutralizing the electrical charge of nucleic acids to facilitate delivery to a desired 
location. These agents have both in vitro, ex vivo, and in vivo applications. Polycationic agents 
can be used to deliver nucleic acids to a living subject either intramuscularly, subcutaneously, 
etc 

The following are examples of useful polypeptides as polycationic agents: polylysine, 

polyarginine, polyornithine, and protamine. Other examples include histones, protamines, 

human serum albumin, DNA binding proteins, non-histone chromosomal proteins, coat proteins 

from DNA viruses, such as (XI 74, transcriptional factors also contain domains that bind DNA 

and therefore may be useful as nucleic aid condensing agents. Briefly, transcriptional factors 

such as C/CEBP, c-jun, c-fos, AP-1, AP-2, AP-3, CPF, Prot-1, Sp-1, Oct-1, Oct-2, CREP, and 

TFDQD contain basic domains that bind DNA sequences. 

Organic polycationic agents include: spermine, spermidine, andpurtrescine. 

The dimensions and of the physical properties of a polycationic agent can be extrapolated from 

the list above, to construct other polypeptide polycationic agents or to produce synthetic 

polycationic agents. 

Synthetic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. 
Lipofectin, and lipofectAMINE are monomers that form polycationic complexes when 
combined with polynucleotides/polypeptides. 
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Jmmunodiaenostic Assays 

Streptococcus antigens of the invention can be used in immunoassays to detect antibody. levels 
(or, conversely, anti-Streptococcus antibodies can be used to detect antigen levels). 
Immunoassays based on well defined, recombinant antigens can be developed to replace 
invasive diagnostics methods. Antibodies to Streptococcus proteins within biological samples, 
including for example, blood or serum samples, can be detected. Design of the immunoassays is 
subject to a great deal of variation, and a variety of these are known in the art. Protocols for the 
immunoassay may be based, for example, upon competition, or direct reaction, or sandwich 
type assays. Protocols may also, for example, use solid supports, or may be by 
immunoprecipitation. Most assays involve the use of labeled antibody or polypeptide; the labels 
may be, for example, fluorescent, chemiluminescent, radioactive, or dye molecules. Assays 
which amplify the signals from the probe are also known; examples of which are assays which 
utilize biotin and avidin, and enzyme-labeled and mediated immunoassays, such as ELIS A 
assays. 

Kits suitable for immunodiagnosis and containing the appropriate labeled reagents are 
constructed by packaging the appropriate materials, including the compositions of the 
invention, in suitable containers, along with the remaining reagents and materials (for example, 
suitable buffers, salt solutions, etc.) required for the conduct of the assay, as well as suitable set 
of assay instructions. 

Use of Polypeptides to Screen for Peptide Analogs and Antagonists 

Polypeptides encoded by the instant polynucleotides and corresponding full length genes can be 
used to screen peptide libraries to identify binding partners, such as receptors, from within the 
library. Peptide libraries can be synthesized according to methods known in the art (e.g. Us 
patent 5,010,175; W091/17823). Agonists or antagonists of the polypeptides if the invention 
can be screened using any available method known in the art, such as signal transduction, 
antibody binding, receptor binding, mitogenic assays, chemotaxis assays, etc. The assay 
conditions ideally should resemble the conditions under which the native activity is exhibited in 
vivo, that is, under physiologic pH, temperature, and ionic strength. Suitable agonists or 
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antagonists will exhibit strong inhibition or enhancement of the native activity at concentrations 
that do not cause toxic side effects in the subject Agonists or antagonists that compete for 
binding to the native polypeptide can require concentrations equal to or greater than the native 
concentration, while inhibitors capable of binding irreversibly to the polypeptide can be added 
in concentrations on the order of the native concentration. 

Such screening and experimentation can lead to identification of a polypeptide binding partner, 
such as a receptor, encoded by a gene or a cDNA corresponding to a polynucleotide described 
herein, and at least one peptide agonist or antagonist of the binding partner. Such agonists and 
antagonists can be used to modulate, enhance, or inhibit receptor function in cells to which the 
receptor is native, or in cells that possess the receptor as a result of genetic engineering. Further, 
if the receptor shares biologically important characteristics with a known receptor, information 
about agonist/antagonist binding can facilitate development of improved agonists/antagonists of 
the known receptor. 

Identification of anti-bacterial aeents 
Drug Screening Assays 

Of particular interest in the present invention is the identification of agents that have activity in 
modulating expression of one or more of the adhesion-specific genes described herein, so as to 
inhibit infection and/or disease. Of particular interest are screening assays for agents that have a 
low toxicity for human cells. 

Hie term "agent" as used herein describes any molecule with the capability of altering or 
mimicking the expression or physiological function of a gene product of a differentially 
expressed gene. Generally a plurality of assay mixtures are run in parallel with different agent 
concentrations to obtain a differential response to the various concentrations. Typically, one of 
these concentrations serves as a negative control ie. at zero concentration or below the level of 
detection. 

Candidate agents encompass numerous chemical classes, including, but not limited to, organic 
molecules (e.g. small organic compounds having a molecular weight of more than 50 and less 
than about 2,500 daltons), peptides, antisense polynucleotides, and ribozymes, and the like. 
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Candidate agents can comprise functional groups necessary for structural interaction with 
proteins, particularly hydrogen bonding, and typically include at least an amine, carbonyl, 
hydroxyl or carboxyl group, preferably at least two of the functional chemical groups. The 
candidate agents often comprise cyclical carbon or heterocyclic structures and/or aromatic or 
polyaromatic structures substituted with one or more of the above functional groups. Candidate 
agents are also found among biomolecules including, but not limited to: polynucleotides, 
peptides, saccharides, fatty acids, steroids, purines, pyrimidines, derivatives, structural analogs 
or combinations thereof. 

Candidate agents are obtained from a wide variety of sources including libraries of synthetic or 
natural compounds. For example, numerous means are available for random and directed 
synthesis of a wide variety of organic compounds and biomolecules, including expression of 
randomized oligonucleotides and oligopeptides. Alternatively, libraries of natural compounds in 
the form of bacterial, fungal, plant and animal extracts are available or readily produced. 
Additionally, natural or synthetically produced libraries and compounds are readily modified 
through conventional chemical, physical and biochemical means, and may be used to produce 
combinatorial libraries. Known pharmacological agents may be subjected to directed or random 
chemical modifications, such as acylation, alkylation, esterification, amidification, etc. to 
produce structural analogs. 

Screening of Candidate Agents In Vitro 

A wide variety of in vitro assays may be used to screen candidate agents for the desired 
biological activity, including, but not limited to, labeled in vitro protein-protein binding assays, 
protein-DNA binding assays (e.g. to identify agents that affect expression), electrophoretic 
mobility shift assays, immunoassays for protein binding, and the like. For example, by 
providing for die production of large amounts of a differentially expressed polypeptide, one can 
identify ligands or substrates that bind to, modulate or mimic the action of the polypeptide. Hie 
purified polypeptide may also be used for determination of three-dimensional crystal structure, 
which can be used for modeling intermolecular interactions, transcriptional regulation, etc. 
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Hie screening assay can be a binding assay, wherein one or more of the molecules may be 
joined to a label, and the label directly or indirectly provide a detectable signal. Various labels 
include radioisotopes, fluorescers, chemiluminescers, enzymes, specific binding molecules, 
particles, e.g. magnetic particles, and the like. Specific binding molecules include pairs, such as 
biotin and streptavidin, digoxin and antidigoxin etc. For the specific binding members, the 
complementary member would normally be labeled with a molecule that provides for detection, 
in accordance with known procedures. 

A variety of other reagents may be included in the screening assays described herein. Where the 
assay is a binding assay, these include reagents like salts, neutral proteins, e.g. albumin, 
detergents, etc. that are used to facilitate optimal protein-protein binding, protein-DNA binding, 
and/or reduce non-specific or background interactions. Reagents that improve the efficiency of 
the assay, such as protease inhibitors, nuclease inhibitors, anti-microbial agents, etc. may be 
used. The mixture of components are added in any order that provides for the requisite binding. 
Incubations are performed at any suitable temperature, typically between 4 and 40°C. 
Incubation periods are selected for optimum activity, but may also be optimized to facilitate 
rapid high-throughput screening. Topically between 0.1 and 1 hours will be sufficient. 
Many mammalian genes have homologs in yeast and lower animals. The study of such 
homologs' physiological role and interactions with other proteins in vivo or in vitro can 
facilitate understanding of biological function. In addition to model systems based on genetic 
complementation, yeast has been shown to be a powerful tool for studying protein-protein 
interactions through the two hybrid system. 

Nucleic Acid Hybridisation 

"Hybridization" refers to the association of two nucleic acid sequences to one another by 
hydrogen bonding. Topically, one sequence will be fixed to a solid support and the other will be 
free in solution. Then, the two sequences will be placed in contact with one another under 
conditions that favor hydrogen bonding. Factors that affect this bonding include: the type and 
volume of solvent; reaction temperature; time of hybridization; agitation; agents to block the 
non-specific attachment of the liquid phase sequence to the solid support (Denhardt's reagent or 
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BLOTTO); concentration of the sequences; use of compounds to increase the rate of association 
of sequences (dextran sulfate or polyethylene glycol); and the stringency of the washing 
conditions following hybridization. See Sambrook etal [supra] Volume 2, chapter 9, pages 
9.47 to 9.57. 

"Stringency" refers to conditions in a hybridization reaction that favor association of very 
similar sequences over sequences that differ. For example, the combination of temperature and 
salt concentration should be chosen that is approximately 120 to 200DC below the calculated 
Tm of the hybrid under study. The temperature and salt conditions can often be determined 
empirically in preliminary experiments in which samples of genomic DNA immobilized on 
filters are hybridized to the sequence of interest and then washed under conditions of different 
stringencies. See Sambrook etal at page 9.50. 

Variables to consider when performing, for example, a Southern blot are (1) the complexity of 
the DNA being blotted and (2) the homology between the probe and the sequences being 
detected. The total amount of the fragments) to be studied can vary a magnitude of 10, from 
0.1 to lpg for a plasmid or phage digest to 10" 9 to 10" 8 g for a single copy gene in a highly 
complex eukaryotic genome. For lower complexity polynucleotides, substantially shorter 
blotting, hybridization, and exposure times, a smaller amount of starting polynucleotides, and 
lower specific activity of probes can be used. For example, a single-copy yeast gene can be 
detected with an exposure time of only 1 hour starting with 1 |ig of yeast DNA, blotting for two 
hours, and hybridizing for 4-8 hours with a probe of 10 8 cpm/jig. For a single-copy mammalian 
gene a conservative approach would start with 10 pg of DNA, blot overnight, and hybridize 
overnight in the presence of 10% dextran sulfate using a probe of greater than 10 8 cpm/fig, 
resulting in an exposure time of -24 hours. 

Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid between the 
probe and the fragment of interest, and consequently, the appropriate conditions for 
hybridization and washing. In many cases the probe is not 100% homologous to the fragment. 
Other commonly encountered variables include the length and total G+C content of the 
hybridizing sequences and the ionic strength and formamide content of the hybridization buffer. 
The effects of all of these factors can be approximated by a single equation: 
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Tm= 81 + 16.6(logi 0 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/n-1.5(%mismatch). 
where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in base 
pairs (slightly modified from Meinkoth & Wahl (1984) Anal. Biochein. 138: 267-284). 
In designing a hybridization experiment, some factors affecting nucleic acid hybridization can 
be conveniently altered. The temperature of the hybridization and washes and the salt 
concentration during the washes are the simplest to adjust As the temperature of the 
hybridization increases (fe. stringency), it becomes less likely for hybridization to occur 
between strands that are nonhomologous, and as a result, background decreases. If the 



radiolabeled probe is not completely homologous with the immobilized fragment (as is 
frequently the case in gene family and interspecies hybridization experiments), the 
hybridization temperature must be reduced, and background will increase. The temperature of 
the washes affects the intensity of the hybridizing band and the degree of background in a 
similar manner. The stringency of the washes is also increased with decreasing salt 
concentrations. 

In general, convenient hybridization temperatures in the presence of 50% formamide are 42°C 
for a probe with is 95% to 100% homologous to the target fragment, 37°C for 90% to 95% 
homology, and 32°C for 85% to 90% homology. For lower homologies, formamide content 
should be lowered and temperature adjusted accordingly, using the equation above. If the 
homology between the probe and the target fragment are not known, the simplest approach is to 
start with both hybridization and wash conditions which are nonstringent If non-specific bands 
or high background are observed after autoradiography, the filter can be washed at high 
stringency and reexposed. If the time required for exposure makes this approach impractical, 
several hybridization and/or washing stringencies should be tested in parallel. 

Nucleic Acid Probe Assays 

Methods such as PCR, branched DNA probe assays, or blotting techniques utilizing nucleic 
acid probes according to the invention can determine the presence of cDNA or mRNA. A probe 
is said to "hybridize" with a sequence of the invention if it can form a duplex or double 
stranded complex, which is stable enough to be detected. 
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The nucleic acid probes will hybridize to the Streptococcus nucleotide sequences of the 
invention (including both sense and antisense strands). Though many different nucleotide 
sequences will encode the amino acid sequence, the native Streptococcal sequence is preferred 
because it is the actual sequence present in cells. mRNA represents a coding sequence and so a 
probe should be complementary to the coding sequence; single-stranded cDNA is 
complementary to mRNA, and so a cDNA probe should be complementary to the non-coding 
sequence. 

The probe sequence need not be identical to the Streptococcal sequence (or its complement) — 
some variation in the sequence and length can lead to increased assay sensitivity if the nucleic 
acid probe can form a duplex with target nucleotides, which can be detected. Also, the nucleic 
acid probe can include additional nucleotides to stabilize the formed duplex. Additional 
Streptococcus sequence may also be helpful as a label to detect the formed duplex. For 
example, a non-complementary nucleotide sequence may be attached to the 5* end of the probe, 
with the remainder of the probe sequence being complementary to a Streptococcus sequence. 
Alternatively, non-complementary bases or longer sequences can be interspersed into the probe, 
provided that the probe sequence has sufficient complementarity with the a Streptococcus 
sequence in order to hybridize therewith and thereby form a duplex which can be detected 
The exact length and sequence of the probe will depend on the hybridization conditions (e.g. 
temperature, salt condition eta). For example, for diagnostic applications, depending on the 
complexity of the analyte sequence, the nucleic acid probe typically contains at least 10-20 
nucleotides, preferably 15-25, and more preferably at least 30 nucleotides, although it may be 
shorter than this. Short primers generally require cooler temperatures to form sufficiently stable 
hybrid complexes with the template. 

Probes may be produced by synthetic procedures, such as the triester method of Matteucci et al 
[J. Am. ChenL Soc. (1981) 103:3185], or according to Urdea etal [Proc. Natl Acad. ScL USA 
(1983) 80: 7461], or using commercially available automated oligonucleotide synthesizers. 
The chemical nature of the probe can be selected according to preference. For certain 
applications, DNA or RNA are appropriate. For other applications, modifications may be 
incorporated eg. backbone modifications, such as phosphorothioates or methylphosphonates, 
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can be used to increase in vivo half-life, alter RNA affinity, increase nuclease resistance eta 
[eg. see Agrawal & Iyer (1995) Curt Opin Biotechnol 6:12-19; Agrawal (1996) T1BTECH 
14:376-387]; analogues such as peptide nucleic acids may also be used [eg. see Corey (1997) 
TIBTECH 15:224-229; Buchardt et al (1993) TIBTECH 1 1:384-386]. 
Alternatively, the polymerase chain reaction (PCR) is another well-known means for detecting 
small amounts of target nucleic acid. The assay is described in Mullis etal [Metk Enzymol 
(1987) 155:335-350] & US patents 4,683,195 & 4,683,202. Two "primei" nucleotides hybridize 
with the target nucleic acids and are used to prime the reaction. The primers can comprise 
sequence that does not hybridize to the sequence of the amplification target (or its complement) 
to aid with duplex stability or, for example, to incorporate a convenient restriction site. 
Typically, such sequence will flank the desired Streptococcus sequence. 
A thermostable polymerase creates copies of target nucleic acids from the primers using the 
original target nucleic acids as a template. After a threshold amount of target nucleic acids are 
generated by the polymerase, they can be detected by more traditional methods, such as 
Southern blots. When using the Southern blot method, the labelled probe will hybridize to the 
Streptococcus sequence (or its complement). 

Also, mRNA or cDNA can be detected by traditional blotting techniques described in 
Sambrook et al [supra]. mRNA, or cDNA generated from mRNA using a polymerase enzyme, 
can be purified and separated using gel electrophoresis. The nucleic acids on the gel are then 
blotted onto a solid support, such as nitrocellulose. The solid support is exposed to a labelled 
probe and then washed to remove any unhybridized probe. Next, the duplexes containing the 
labeled probe are detected. Typically, the probe is labelled with a radioactive moiety. 
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r able 1: Complete list f GBS predicted genes 




v-v T7I 

ORF 


Size 
(a.a.) 


Ann tan n 


oAGOOOl 


453 


— — — ; . ; . ; — — 

chromosomal replication initiator protein DnaA 


oAO0002 


3 /o 


DNA polymerase III, beta subumt 




293 


diacylglycerol kinase catalytic domain protein, putative 


O A PAAA/I 

oAG0004 


<cc 
05 


conserved hypothetical protein 


C A PAAAC 


0/ 


hypothetical protein 


O A PAAAiC 

oALrOOOo 


371 


urlr-bmaing protein Ycnr 


SAG0007 


1 A1 

191 


peptidyl-tRNA hydrolase 


O A PAAAO 


1165 


transcription-repair coupling factor 


O A /TLAAAA 

DAG0009 


31 


hypothetical protein 


C A /T1AA1 A 

bAGOOlO 


AA 
90 


S4 domain protein 


bAGOOll 


123 


cell division protein DivIC, putative 


O A PAA 1 O 


44 


conserved hypothetical protein 


O A nAA1 1 
OAU0013 


428 


protein of unknown function 


SAGOO 14 


424 


MesJ/Ycf62 family protein 


O A /^i f\f\ 1 C 

SAGOO 15 


180 


hypoxantmne-guanine phosphonbosyltransferase 


SAGOO 16 


658 


cell division protein FtsH 


Cl A AA1 T 

SAGOO 17 


447 


pcsB protein 


CI A AA1 O 

SAGOO 18 


322 


nbose-phosphate pyrophosphokinase 


SAGOO 19 


ft"! 

391 


aminotransferase, class I 


SAG0020 


253 


recombination protein O 


SAG0021 


283 


protease, putative 


SAG0022 


330 


fatty acid/phospholipid synthesis protein PlsX 


SAG0023 


79 


acyl carrier protein 


SAG0024 


234 


phosphonbosylammoimidazole-succmocarboxamide synthase 


SAG0025 


1241 


phosphonbosylformylglycinamidine synthase, putative 


SAG0026 


484 


amidophosphonbosyltransferase 


SAG0027 


340 


phosphonbosylformylglycmamidme cyclo-ligase 


SAG0028 


182 


phosphoribosylglycinamide formyltransferase 


C? A PAAOA 
bAG0029 


OCA 

250 


acetyltransferase, GNAT family 


0 A PAA1A 

oAGOOiO 


c 1 c 

515 


pnospnoriDOsylammoinudazolecarboxamide 
fbnnyltransferase/IMP cyclohydrolase 


C A PAA1 1 


299 


_peptiaase, M23/M37 tamily 


c a nnmo 
oALfUOjz 


434 


group B streptococcal surface immunogenic protein 




232 


N-acetylmannosamine-6-P epimerase, putative 




/I IB 
43 0 


sugar ABC transporter, sugar-binding protein 


SAG0035 


295 


sugar ABC transporter, permease protein 


oAG003o 


lib 


sugar ABC transporter, permease protein 




147 


conserved hypothetical protein 


CJ A nAAl O 


A 

220 


conserved hypothetical protein 


C A nAAIA 

oAvjOU39 


O AC 

305 


N-acetylneuraminate lyase, putative 


c a nnfi4.n 

D.ciAjUV/*tU 




ivv^iv iamuy proiein 


SAG0041 


325 


acetyl xylan esterase, putative 


SAG0042 


267 


phosphosugar-binding transcriptional regulator, RpiR family, 
jputative 


SAG0043 


421 


phosphoribosylainine--glycine ligase 


SAG0044 


162 


phosphoribosylammoinridazole carboxylase, catalytic subunit 


SAG0045 


363 


phosphoribosylaminoimidazole carboxylase, ATPase subunit 


SAG0046 


463 


membrane protein, putative 



Table 1: C mplete list of GBS predicted genes 



ORF 


Size < 
(a.a.) 


Annotation 


SAG0047 


432 


adenylosuccinate lyase 


O A y^i AAi A 

SAG0048 


303 


transcriptional regulator, Cro/CI family 


SAG0049 


332 


Holliday junction DNA helicase RuvB 


SAG0050 


145 


phosphotyrosine protein phosphatase, low molecular weight 


SAG0051 


126 


MORN motif family protein 


SAG0052 


592 


membrane protein, putative 


SAG0053 


880 


aldehyde-alcohol dehydrogenase 


SAG0054 


338 


alcohol dehydrogenase, propanol-preferrmg 


SAG0055 


496 


threonine synthase 


SAG0056 


412 


MATE efflux family protein 


SAG0057 


102 


ribosomal protein S10 


SAG0058 


208 


ribosomal protein L3 


SAG0059 


207 


ribosomal protein L4 


SAG0060 


98 


ribosomal protein L23 


SAG0061 


277 


ribosomal protein L2 


SAG0062 


92 


ribosomal protein S19 


. SAG0063 


114 


ribosomal protein L22 


. SAG0064 


217 


ribosomal protein S3 


SAG0065 


137 


ribosomal protein L16 


SAG0066 


68 


ribosomal protein L29 


SAG0067 


86 


ribosomal protein S17 


SAG0068 


122 


ribosomal protein L14 


SAG0069 


101 


ribosomal protein L24 


SAG0070 


180 


ribosomal protein L5 


SAG0071 


61 


ribosomal protein SI 4, putative 


SAG0072 


132 


ribosomal protein S8 


SAG0073 


178 


ribosomal protein L6 


SAG0074 


118 


ribosomal protein LI 8 


SAG0075 


164 


ribosomal protein S5 


SAG0076 


59 


ribosomal protein L30 


SAG0077 


1 A 

146 


ribosomal protein LI 5 


SAG0078 


A A 

434 


preprotem translocase, SecY subumt 


SAG0079 


212 


adenylate kinase 


O A PAAOA 


72 


translation initiation factor BF-1 


O a riftnfi 1 

SACjOOoI 


38 


ribosomal protein L36 


O A PAAOO 

bAGOOoz 


121 


ribosomal protein S13 


SAG0083 


1 18 


»1 1 » • Oil 

ribosomal protein S 1 1 


O A f\f\0 A 

SAG0084 


312 


TX'V T A J* a J TfcTkT A \ 111- " « 

DNA-directed RNA polymerase, alpha subumt 


SAG0085 


128 


"1 1 A - TIT 

ribosomal protem LI 7 


SAG0086 


85 


1 * A. • A A* 

lipoprotein, putative 


OAVJUUo / 


<CO 


hypothetical protein 


SAG0088 


56 


hypothetical protein 


SAG0089 


183 


conserved hypothetical protein 


SAG0090 


139 


conserved hypothetical protein 


SAG0091 


144 


transcriptional regulator ComXl, putative 


SAG0092 


230 


phosphoglycerate mutase family protein 


SAG0093 


250 


D-alanyl-D-alanihe carboxypeptidase family protein 


SAG0094 


191 


N-acetylmiiramoyl-L-aianine amidase, family 4 protein 
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ORF 


Size 
(a.a.) 


A MM r> A n Ait ■■ ii ■ 

Annotation 


SAG0095 


1A A 

344 


: : ; — : — 

heat-inducible transcription repressor HrcA 


SAG0096 


1 OA 


heat shock protein GrpE 


SAG0097 


iCAO 

ouy 


dnaK protein 


O A PAAAO 

SAG0098 


3 /9 


dnaJ protein 


SAG0099 


415 


transcriptional regulator, GntR family 


SAGO 1 00 


O CO 

258 


tRNA pseudouridine synthase A 


SAGO 101 


252 


phosphomethy lpyrimidine kinase, putative 


SAGO 1 02 


1 C A 

154 


conserved hypothetical protein 


SAGO 103 


189 


conserved hypothetical protein 11GK01440 


SAGO 104 1 


AAA 

280 


conserved hypothetical protein 


SAGO 105 


427 


tngger factor 


SAGO 1 06 ! 


1 C41 

191 


DNA-directed RNA polymerase, delta subunit, putative 


Cl A r<A1 AT 

SAGO 1 07 


534 


CTP synthase 


CI A f\ i /\0 

SAGO 1 08 


308 


conserved hypothetical protein 


SAG0109 


148 


deoxyuridine 5 "-triphosphate nucleotidohydrolase 


CI A riAl 1 /\ 

SAG0110 


454 


T>vi A • a • TX J A 

DN A repair protem RadA 


SAG0111 


165 


carbonic anhydrase-related protein 


SAG0112 


439 


pyridine nucleotide-disulphide oxidoreductase family protein 


SAG0113 


484 


glutamyl-tRNA synthetase 


CI A /*^/\^ "1 A 

SAG0114 


322 


ribose ABC transporter, periplasmic D-ribose-binding protein 


SAG0115 


310 


ribose ABC transporter, permease protein 


CI A /\ -1 1 

SAG0116 


a n't 

492 


ribose ABC transporter, ATP-binding protein 


CI A Cl/\1 f T 

SAG0117 


132 


ribose ABC transporter protein RbsD 


SAG0H8 


AA1 

303 


ribokinase 


SAG0119 


328 


ribose operon repressor RbsR 


SAGO 1 20 


32 


hypothetical protein 


SAG0121 


362 


permease, putative 


CJ A 

SAG0122 


000 
228 


AdL transporter, Air-binding protem 


bAG0123 


223 


DNA-binding response regulator 


. aAOUlZ4 


356 


sensor histidine kinase 


o a 

bAGU125 


1A£ 

396 


argininosuccinate synthase 


CJ A PA1 TiC 


462 


argininosuccinate lyase 




2*/3 


fructose-bisphosphate aldolase 


oAuUIZo 


OAC 

3Uj 


L-2-hydroxyisocaproate dehydrogenase 


SAG0129 


62 


ribosomal protein L28 


CJ A OA 


121 


conserved hypothetical protein 


C APAI 1 1 

oAGU131 


543 


DAK2 domain protein 


c a r*r\ i oo 
dAGUI 32 


*M (\A 

294 


CT)T?U -i tr~\ 1 *y -C * t 

JSrrrl aomain/Bana / iamily protem 


CJ* A 111 

S>AG0133 


38 


conserved hypothetical protein 


O A f~lf\1 1A 


96 


hypothetical protein 


SAG0135 


246 


amino acid ARC transnorter ATP-bindinff nrotein 


SAG0136 


516 


amino acid ABC transporter, amino acid-binding protein/permease 
protein 


SAG0137 


627 


conserved hypothetical protein 


SAG0138 


279 


undecaprenol kinase, putative 


SAG0139 


251 


negative regulator of competence MecA, putative 


SAG0140 


386 


glycosyl transferase, group 4 family protein 


SAG0141 


256 


ABC transporter, ATP-bindine protein 



r -a 
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(a.a.) 


A A A • 

Annotation 


SAGO 142 


420 


conserved hypothetical protein 


SAG0143 


410 


selenocysteine lyase 


SAG0144 


147 


\T" XT T X* • 1 . • 

NifU family protein 


SAGO 145 


472 


1 « A 1 A_* 1 - A. - * 

conserved hypothetical protein 


SAG0146 


395 


penicillin-binding protein 4, putative 


SAG0147 


411 


D-alanyl-D-alanine carboxypeptidase family protein 


SAGO 148 


551 


oligopeptide ABC transporter, substrate-binding protein, putative 


SAG0149 


304 


oligopeptide ABC transporter, permease protein 


SAG0150 


343 


oligopeptide ABC transporter, permease protem 


SAG0151 


348 


oligopeptide ABC transporter, ATP-binding protein 


SAG0152 


310 


oligopeptide ABC transporter, ATP-binding protem 


SAG0153 


283 


A * • * ft . • ft ft A^ J ft * "ft-^. |« * A ft ft • 

4-diphosphocytidyl-2C-methyl-D-erythritol kinase 


SAG0154 


147 


adc operon repressor AdcR 


SAG0155 


236 


zinc ABC transporter, ATP-binding protein 


SAG0156 


270 


zinc ABC transporter, permease protein 


SAG0157 


NA 


deoxyribonuclease-related protein, degenerate 


SAG0158 


419 


tyrosyl-tRNA synthetase 


SAG0159 


765 


penicillin-binding protein IB, putative 


SAG0160 


1191 


DNA-directed RNA polymerase, beta subunit 


SAG0161 


1216 


DNA-directed RNA polymerase beta 1 subunit 


SAG0162 


121 


conserved hypothetical protein 


SAG0163 


323 


competence protein CglA 


SAG0164 


282 


competence protein CglB 


SAG0165 


151 


conserved hypothetical protein 


SAG0166 


123 


conserved domain protein 


SAGO 167 


324 


conserved hypothetical protein 


SAGO 168 


397 


acetate kinase 


SAG0169 


68 


transcnptional regulator, Cro/Cl family 


SAGO 170 


45 


hypothetical protein 


rf*ft A ■« ftl 

SAG0171 


151 


hypothetical protein 


SAG0172 


221 


. a A * 

protease, putative 1 


SAG0173 


256 


pyrrohne-5-carboxyIate reductase 


SAG0174 


355 


t 1 1 - A* J 

glutamyl-aimnopeptidase 


SAG0175 


79 


1 At -A_? 1 A * 1 

hypothetical protein 


SAGO 176 


94 


_J 1 _ A_"1 - _ * _ _ 4 a * 

conserved hypothetical protem 


SAGO 177 


107 


thioredoxm family protem _j 


SAGO 178 


208 


tRNA binding domam protem 


SAGO 179 


238 


J t_ A_T_ A» 1 A • 

conserved hypothetical protem 


ft A Z^tAI OA 

SAGO 180 


131 


single-strand binding protem 


SAG0181 


214 


hydrolase, haloacid dehalogenase-like family 


5>AuU 1 oz 




sensor histidine kinase, putative 


SAG0183 


246 


response regulator 


SAG0184 


151 


conserved hypothetical protein 


SAG0185 


242 


membrane protein, putative 


SAG0186 


36 


hypothetical protein 


SAGO 187 


542 


oligopeptide ABC transporter, oligopeptide-binding protein 


SAG0188 


1 325 


oligopeptide ABC transporter, permease protein 


SAG0189 


273 


oligopeptide ABC transporter, permease protein 



4 
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Ann ota ti n 


SAG0190 


267 


peptide Ar>u transporter, Air-Dinciing protein 


SAG0191 


2Uo 


peptide AxSL^ transporter, /\ i .r-Dinaing proiein 


SAGO 192 


ana 
6/6 


/lo system, iiajdv^ components 


SAGO 193 


C/t 1 

541 


alpna amylase iamny protein 


ci a y^c** C% A 

SAGO 194 


639 


transcriptional antiterminator, BglG family 


SAGO 1 95 


377 


] S 1 548, transposase i 


SAGO 1 96 


66 


conserved domain protein 


SAGO 1 97 


A/t 

94 


J TS system, IIB component, putative 


SAGO 198 


451 


PTS system, IIC component, putative 


SAGO 199 


o o c 

285 


transketolase, N-terminal subunit 


ci a / — \ t\n\ r\ r~\ 

SAG0200 


309 


transketolase, C-terminal subunit 


SAG0201 


419 


oxidoreductase, putative 


SAG0202 


OA 

89 


ribosomal protein SI 5 


SAG0203 


709 


polyribonucleotide nucleotidyltransferase 


SAG0204 


250 


conserved hypothetical protein 


SAG0205 


1 94 


serine O-acetyltransferase 


SAG0206 


60 


hpoprotem, putative 


SAG0207 ! 


447 


cysteinyl-tRNA synthetase 


SAG0208 


128 


conserved hypothetical protein 


SAG0209 


25 1 


RNA methyltransferase, TnnH family, group 3 


SAG0210 


172 


conserved hypothetical protein 


SAG0211 


286 


DegV family protein 


SAG0212 


32 


hypothetical protein 


SAG0213 


39 


hypothetical protein 


SAG0214 


148 


•i a.- ; t 1 o 

ribosomal protem L 1 3 J 


SAG0215 


130 


ribosomal protein S9 \ 


CI A /AO 1 X" 

SAG0216 


33 


hypothetical protein 


SAG0217 


O O A 

384 


site-specmc recombmase, pnage integrase ramuy 


CI A /-"* CVO 1 O 

SAG0218 


158 


transcriptional regulator, Cro/CI family 


CI A AA t f\ 

SAG0219 


1 A1 

101 


hypothetical protein 


SAG0220 


ao 

92 


conserved hypothetical protein 


SAG0221 


76 


hypothetical protein 


Ct A AOT^ 

SAG0222 


1 AO 

108 


conserved domain protein 


CI A /"*W"VO O *5 

SAG0223 


AAA 

209 


conserved hypothetical protein, fusion 


SAUU2Z4 


332 


replication initiation protein, puianve 


SAGU225 


1 vl/t 

144 


hypothetical protein 


bAUUZzo 


41c> 


recombination protein 


SAG0227 


1 ca 
156 


hypothetical protein 


SAG0228 


| 111 


conserved hypothetical protein 


SAG0229 


c\C 
95 


conserved hypothetical protein 






f^ATiQi^TVPrl Tivnnthf*rif k Ji1 lYrftfp'in 


SAG0231 


I 135 


hypothetical protein 


SAG0232 


186 


hypothetical protein 


SAG0233 


226 


hypothetical protein 


SAG0234 


128 


hypothetical protein 


SAG0235 


93 


hypothetical protein 


SAG0236 


32 


hypothetical protein 


SAG0237 


34 


hypothetical protein 
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SAG0238 


/i 1 
41 


r — : — : : 

lypothetical protein 


SAG0239 


ZOO 


Transcriptional regulator iviulk. iamuy 


SAG0240 


3V3 


xansporter, putative 


SAG0241 


Oil 

213 


ammo acid ajdU transporter, permease protein 


SAG0242 


1 AO 

308 


amino acid Arse transporter, ammo acid-Dincnng protein 


SAG0243 


211 


amino acid ABC transporter, permease protein 


SAG0244 


381 


amino acid ABU transporter, A l r-Dinding protein 


SAG0245 


1 CA 

152 


protein of unknown function/lipoprotein, putative 


SAG0246 


268 


lypothetical protein 


SAG0247 


116 


lypothetical protein 


SAG0248 


90 


hypothetical protein 


SAG0249 


116 


hypothetical protein 


SAG0250 


193 


membrane protein, putative 


SAG0251 


72 


transcriptional regulator, Cro/CI family 


SAG0252 


186 


acetyltransferase, GNAT tamiiy 


SAG0253 


192 


acetyltransferase, GNAT family 


SAG0254 


226 


acetyltransferase, GNAT family 


SAG0255 


315 


conserved hypothetical protem 


SAG0256 


163 


RNA polymerase sigma factor, ECF subfamily 


SAG0257 


53 


lipoprotein, putative 


SAG0258 


202 


transcriptional regulator, TetR family 


SAG0259 


365 


ABC transporter efflux protein, DrrB family, putative 


SAG0260 


238 


ABC transporter, ATP-binding protein 


SAG0261 


129 


IS 1381, transposase OrfB 


SAG0262 


1 OT 

127 


IS 1381, transposase OrfA 


SAG0263 


171 


hypothetical protein 


SAG0264 


103 


conserved hypothetical protein 


SAG0265 


A<) /• 

235 


conserved hypothetical protein 


SAG0266 


382 


N-acetylglucosamine-6-phosphate deacetylase 


SAG0267 


! 180 


conserved hypothetical protein 


Ci a /"^ AO/TO 

SAG0268 


304 


glycyl-tKJM A synthetase, alpna su burnt 


SAG0269 


All 

213 


acyl carrier protein phosphodiesterase, putative 


SAG0270 


679 


glycyl-tRNA synthetase, beta subunit 


SAG0271 


Of 

85 


conserved hypothetical protein 


bAG0272 


OT 

87 


membrane protein, putative 


SAG0273 


CAA 

502 


glycerol kinase 


SAG0274 


609 


alpha-glycerophosphate oxidase 


SAG0275 


A1A 

232 


glycerol uptake facilitator protein 


SAG0276 


/I /I C 

445 


NADH oxidase, putative 


SAG0277 


476 


conserved hypothetical protein 




Qui 




SAG0279 


101 


conserved hypothetical protein 


SAG0280 


244 


ABC transporter, ATP-binding protein 


SAG0281 


1 534 


membrane protein, putative 


SAG0282 


461 


PTS system, IIBC components 


SAG0283 


267 


glutamate 5-kinase 


SAG0284 


! 417 


gamma-glutamyl phosphate reductase 


SAG0285 


298 


conserved hypothetical protein TIGR00006 
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(a.a.) 


Annotati n 

. ■ . 


SAG0286 


1 AO 

1 08 


cell division protein FtsL, putative 


SAG0287 


752 


penicillin-binding protein 2X 


SAG0288 


J3o 


phospho-N-acetylmuramoyl-pentapeptide-transferase 


SAG0289 


A A *7 

447 


ATP-dependent RNA hehcase, DEAD/DEAH box family 


SAG0290 


270 


ABC transporter, substrate-binding protein 


SAG0291 


267 


amino acid ABC transporter, permease protein 


SAG0292 


247 


amino acid ABC transporter, ATP-binding protein 


SAG0293 


74 


conserved hypothetical protein 


SAG0294 


304 


thioredoxin reductase 


SAG0295 


A Ci £" 

486 


conserved hypothetical protein 


SAG0296 


273 


NAD synthetase 


SAG0297 


444 


» A* J y^l 

aminopeptidase C 


SAG0298 


750 


penicillm-bmding protein 1A 


SAG0299 


199 


recombination protein U 


SAG0300 


172 


conserved hypothetical protein 


SAG0301 


40 


hypothetical protein 


SAG0302 


110 


conserved hypothetical protein 


SAG0303 


384 


conserved hypothetical protein 


SAG0304 


487 


conserved hypothetical protein 


SAG0305 


160 


automducer-2 production protem LuxS 


SAG0306 


535 


KH domain protein 


SAG0307 


33 


hypothetical protein 


SAG0308 


298 


ABC transporter, ATP-binding protem 


SAG0309 


246 


A T% f*\ A A A * A. A_* 

ABC transporter, permease protem, putative j 


SAG0310 


361 


•f « A.1 A." 1 « * 

conserved hypothetical protein 


SAG0311 


NA 


DNA-binding response regulator, authentic point mutation 


SAG0312 


234 


J 1 _ 1 1_ % A. # 

conserved hypothetical protein 


SAG0313 


209 


guanylate kinase 


SAG0314 


104 


DNA-directed RNA polymerase, omega subunit, putative 


SAG0315 


796 


primosomal protein N' 


SAG0316 


311 


methionyl-tRNA formyltransferase 


SAG0317 


440 


sim protein 


SAG0318 


245 


senne/threomne phosphatase, putative 


SAG0319 


yet 

651 


senne/threomne protem kinase 


SAG0320 


( 231 


conserved hypothetical protein 


SAG0321 


339 


sensor histidine kinase, putative 


SAG0322 


213 


DNA-binding response regulator 


SAG0323 


466 


hydrolase, haloacid dehalogenase family/peptidyl-prolyl cis-trans 
isomerase, cyclophilin type 


SAG0324 


1 24 


general stress protein, putative 


oAuU J ZD 




pyruvate formate-lyase-activating enzyme 


SAG0326 


251 


transcriptional regulator, DeoR family 


SAG0327 


327 


transcriptional regulator, putative 


SAG0328 


107 


PTS system, cellobiose-specific IIA component 


SAG0329 


106 


PTS system, cellobiose-specific IIB component 


SAG0330 


433 


PTS system, cellobiose-specific IIC component 


SAG0331 


818 


formate acetyltransferase 


SAG0332 


222 


transaldolase family protein 
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Size 
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A — A- — A* 

Annotation 

— 


CI A /^AT>1 

SAG0333 


Joz 


glycerol dehydrogenase 


SAG0334 




cysteine synthase A 


SAG0335 | 


214 


conserved nypotneucai protein 1 igkoozd / 


SAG0336 


42V 


helicase, putative 


SAG0337 


221 


competence protein F, putative 


SAG0338 


184 


ribosomal subunit interface protein 


SAG0339 


A c A 

450 


aspartate kinase family protein 


SAG0340 


216 


hydrolase, haloacid dehalogenase-like family 


O A Z" 1 A*> /I 1 

SAG0341 


A A 

49 


hypothetical protein 


SAG0342 


0 *ro 
263 


enoyl-CoA hydratase/isomerase family protein 


SAG0343 


1 A A 

144 


transcriptional regulator, MarR family 


ci a /"i /\*> /t a 

SAG0344 


323 


3-oxoacyI-(acyl-carrier-protein) synthase III 


o a A** A 

SAG0345 


TA 

74 


acyl earner protem 


SAG0346 


319 


enoyHacyl-camer-protem) reductase II 


SAG0347 


308 


malonyl CoA-acyl earner protein transacylase 


SAG0348 


244 


3-oxoacyl-[acyl-carner protem] reductase 


SAG0349 


410 


3-oxoacyl-(acyl-carner-protem) synthase II 


SAG0350 


166 


acetyl-CoA carboxylase, biotm carboxyl earner protem 


SAG0351 


140 


(3R)-hydroxymynstoyl-(acyl-carner-protem) dehydratase 


SAG0352 


456 


acetyl-CoA carboxylase, biotin carboxylase 


SAG0353 


291 


acetyl-CoA carboxylase, carboxyl transferase, beta subunit 


SAG0354 


257 


acetyl-CoA carboxylase, carboxyl transferase, alpha subunit 


SAG0355 


210 


conserved hypothetical protem 


SAG0356 


425 


f jT\\t A a!_ a 

seryl-tRNA synthetase 


SAG0357 


330 


membrane protein, putative 


SAG0358 


120 


conserved hypothetical protein 


SAG0359 


O AO 

303 


PTS system, mannose-specific IID component 


O A Z" 1 AO /Zf\ 

SAG0360 


T7A 

( 270 


PTS system, mannose-specific IIC component 


SAG0361 


336 


PTS system, mannose-specific IIAB components 


bAG03o2 


270 


hydrolase, haloacid dehalogenase-like family 




1 C\A 

194 


hypothetical protein 


bAG0J54 


203 


membrane protein, putative 


SAG0365 


vino 

473 


xanthine/uracil permease family protein 


bAGUJoo 


lov 


conserved hypothetical protem 11GKU0150 


bAGOJo/ 


loo 


acetyltransterase, GJNA1 tamily 


IS AGO J 05 


/•O C 

435 


protein of unknown function 


!S AGO Joy 


9o 


conserved hypothetical protein 


O A r'AI *7A 

bAGOJ /U 


139 


X XT" 1 ' -f*- * 1 - - * * 

HI 1 family protein 


O A PAO *7 1 

bAGOi / 1 


1 an 
167 


hypothetical protein 


bAGOi /Z 


Of 

85 


hypothetical protein 






m>i/ uanspoirer, /\ 1 jr-oinuing proiem 


SAG0374 


344 


ABC transporter, permease protein 


SAG0375 


266 


conserved hypothetical protein 


SAG0376 


211 


conserved hypothetical protein TIGR00091 


SAG0377 


127 


conserved hypothetical protein 


SAG0378 


379 


N utilization substance protein A 


SAG0379 


98 


conserved hypothetical protein 


SAG038O 


100 


ribosomal protein L7A family 
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Table 1: Complet list of GBS predicted genes 



ORF 


Size 
(a.a.) 


Annotati n 


SAG0381 


927 


translation initiation factor 1F-2 j 


SAG0382 


122 


nbosome-bmding factor A 


SAG0383 


334 


protein of unknown fimction/lipoprotein, putative 


SAG0384 


too 

138 


transcriptional repressor CopY 


SAG0385 


744 


copper-transporter ATPase CopA 


SAG0386 ! 


68 


copper-transporter protein CopZ 


SAG0387 


204 


membrane protein, putative 


SAG0388 


270 


hydrolase, haloacid dehalogenase-hke family 


SAG0389 


880 


DN A polymerase I 


SAG0390 


146 


Co A-binding domain protem 


SAG0391 


159 


transcnptional regulator, Fur family 


SAG0392 


52 1 


cell wall surface anchor family protem 


SAG0393 


228 


DNA-binding response regulator 


SAG0394 


345 


sensor histidine kinase 


SAG0395 


246 


membrane protein, putative 


SAG0396 


380 


queuine tRNA-ribosyltransferase 


SAG0397 


102 


conserved hypothetical protein 


SAG0398 


1 79 


BioY family protem ! 


SAG0399 


258 


AtsA/ElaC family protein 


SAG0400 


168 


cytidine/deoxycytidylate deaminase family protein 


SAG0401 


44 


hypothetical protein 


SAG0402 


449 


glucose-6-phosphate isomerase 


SAG0403 


1 75 


5-formyltetrahydrofolate cyclo-hgase family protein 


O A X*1 A>i A il 

SAG0404 


225 


rhomboid family protein j 


SAG0405 


347 


protein of unknown function/hpoprotein, putative 


SAG0406 


299 


UTP-glucose-l-phosphate undylyltransferase 


SAG0407 


338 


glycerol-3-phosphate dehydrogenase (NAD(P)+) 


SAG0408 


\ 109 


nbonuclease P protem component 


SAG0409 


271 


Cy TTT T a?* "1 a * 

SpoIII J family protein 


SAG0410 


273 


T» ITT J • a. • 

R3H domain protein 


SAG04H 


177 


conserved hypothetical protein 


CI A /"I /\ A 1 *"t 

SAG0412 


258 


recX protein 


SAG0413 


451 


Tti'XT A a.1 1a r* 1-rt a _T» • 1 

RNA methyltransferase, TrmA family 


SAG0414 


153 


J 1_ A.1 A^ 1 A. • 

conserved hypothetical protein 


SAG0415 


142 


acetyltransferase, GNAT family 


C A r*c\A 1 a 
SACjU41o 


1233 


A. 

protease, putative 


SAG0417 


302 


glycosyl transferase, group 2 family protein 


SAG0418 


336 


ribonucleoside-diphosphate reductase 2, beta subunit 


O A Z" 1 A/1 1 O 


137 


nrdl protein 


oAu0420 


721 


" 1 I ■ J J • f I . J A. r\ ~ 1 1 1 *A. 

ribonucleoside-diphosphate reductase 2, alpha subunit 




1 


cen wan sunace ancnor iamny proiem 


SAG0422 


129 


conserved hypothetical protein 


SAG0423 


132 


conserved domain protein 


SAG0424 


94 


hypothetical protein 


SAG0425 


105 


carboxymuconolactone decarboxylase family protein 


SAG0426 


131 


conserved hypothetical protein 


SAG0427 


129 


transcriptional regulator, MerR family 


SAG0428 


345 


alcohol dehydrogenase, zinc-containing 
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Table 1: Complete list f GBS predicted genes 



ORF 


Size 
(a.a«) 


Annotati n 


SAG0429 


O O A 

284 


oxidoreductase, aldo/keto reductase family 


SAG0430 


287 


cation efflux system protem 


SAG043 1 


1 "7 A 

174 


transcnptional regulator, TetR family 


SAG0432 


*i At 

397 


transcnptional regulator, AraC family 


SAG0433 


1389 


surface protein Rib 


O A f\ A*\ A 

SAG0434 


61 


transposase, IS256 family, truncation 


SAG0435 


97 


T*WT A J ► J *1 1 a " -r . 

DNA-damage-inducible protein J, putative 


SAG0436 


62 


hypothetical protem 


SAG0437 


123 


lipoprotein, putative 


CI A yTV A O 

SAG0438 


1 A C 

145 


1 j _ • * T A * a j a * 

bactenophage L54a, integrase, truncation 


SAG0439 


XT A 

NA 


conserved hypothetical protein, degenerate 


SAG0440 


O A 

84 


conserved hypothetical protein 


SAG0441 


103 


conserved domain protein 


ft A *^ A A A /"% 

SAG0442 


189 


acetyltransferase, GNAT family 


O A s~y f\ A A o 

SAG0443 


194 


acetyltransferase, GNAT family 


/~1 » y» #> AAA 

SAG0444 


188 


conserved hypothetical protein 


SAG0445 


883 


vaiyi-tRNA synthetase 


SAG0446 


319 


oxidoreductase, Gfo/Idh/MocA family 


SAG0447 


287 


magnesium transporter, CorA family 


SAG0448 


391 


transposase, IS256 family 


SAG0449 


354 


conserved hypothetical protein 


SAG0450 


330 


aspartate— ammonia ligase 


SAG0451 


149 


bactenocm transport accessory protein, putative 


SAG0452 


179 


a_ TT r\\T A j • r» , • . i i . - .» 

type II DNA modification methyltransferase, putative 


CI A A 

SAG0453 


A/ 1 

96 


hypothetical protein 


CI A f>f\A C A 

SAG0454 


161 


phosphopantetheme adenylyltransferase 


CI Kf*f\ACE 

SAG0455 


357 


conserved hypothetical protein 


CI A C* A C VT 

SAG0456 


XT A 

NA 


conserved hypothetical protein, degenerate 


SAG0457 


192 


conserved hypothetical protem 


O A (~*t\A C O 

SAG0458 


368 


conserved hypothetical protein TIGR00048 


O A t~*t\A Cfl 

SAG0459 


171 


VanZF domain protein 


bAG04oO 


CO 1 

581 


ABC transporter, ATP-binding/permease protein 


SAG0461 


579 


ABC transporter, ATP-binding/permease protein 


bAU04o2 


1 oo 

188 


anthranilate synthase component II 


C A OAA/CI 

oAUU4oi 


179 


BioY family protein 


£>AG04o4 


330 


j* aI_ ^ a 

biotm synthetase 


O A i^f\A HC 

bAGU463 


1 £.A 

164 


hypothetical protem 


bAUU4oo 


371 


tniolase 


!SA(jU4o7 


A A A 

409 


A TV. ATI LI.JI— n _ ^ A A.—.i— . 

AMP-binding enzyme domam protein 




210 


endonuclease HI 




ill 


xype i v prepnin pepuaase-reiaieu protein 


SAG0470 


69 


conserved hypothetical protein 


SAG0471 


322 


glucokinase 1 


SAG0472 


126 


rhodanese-like family protein 


SAG0473 


613 


elongation factor Tu family protein 


SAG0474 


81 


conserved hypothetical protein 


SAG0475 


451 


UDP-N-acetylmiiramoylalanine—D-glutamate ligase 


SAG0476 


358 


UDP-N-acetylglucosamine— N-acetylmuramyl-(pentapeptide) 
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Table 1: C mplete list of GBS predicted genes 



ORF 


Size 
(a. a. ) 


/vnnoiauon 






jyropiiuopnoryi -unocv»p±ciii/i in— deciy jgiu^vocuiiijuw u<uiaici<wc 


o ALrU4 / / 


D iO 


ceil envision protein i-jivii>, puwuvc 


Q A CICSAHQ. 

oALjU4/o 


A'JQ 


ecu ui vision pro vein r u>/x 




*fZO 


ceil a i vision proiein r ioz_» i 


o a f~is\A on 


ZZ4 


yiiTLti protein, putative 


o a nriA q 1 S 


Zui 


vimr protein 1 


bACjU4oZ 


o4 


VnnT -PoYV>«1tr «-rrv+*>in 

i vjtvj i ianuiy proiein 




zoz 


yimxi proiein 


C A f~Lf\A OA 

oA\jU4o4 


ZjO 


Cell aivision protein j-/ivi v /\, puiau ve 


O A f~*t\A O C 

SAvjU485 


nan 


isoieucyi-tKJN a synxnciase 


C? A f*f\A O/T 

SACjU4oo 


i nn 
1UU 


conserved nypouiexicai proiein 


SAG0487 


151 


MutT/nudix family protein 


SAG0488 


753 


A i r-depenaent \^yp protease, a 1 r^-oinaing suounii 


SAG0489 


* 1 A 

34 


hypothetical protein 


SAG0490 


76 


conserved hypothetical protein 


SAG0491 


230 


amino acid ABC transporter, permease protein 


SAG0492 


244 


ammo acid ABC transporter, Alr-bmding protem 


SAG0493 


564 


phosphoglucomutase/phosphomannomutase family protein 


SAG0494 


^ O A 

284 


methylenetetrahydrofolate 

aenydrogenase/meuienyltetranydroioiaie cycionyoroiase 


SAG0495 


278 


protein of unknown function 


bAGU49o 


/I A C 

44o 


exoaeoxy n do nuclease vjjl, large suoumi 


SAG0497 


71 


exodeoxynoonuciease vii, smaii sudiuui 


cr a /^*f\A no 
SAG0498 


Z9U 


geranyltranstransterase, puiauve 


c* a /\ a nn 
SAG0499 


Z73 


hemolysin A 


O A PACAA 


Id/ 


arginine repressor ArgR, putative 


O A r'ACAl 


ceo 
DjZ 


DIN A repair protein Kecis 


bAvjrU->UZ 


Z78 


uegv iamiiy proiein 


c a nn^Ai 
oAvjUMIj 


Z/y 


lipase/ acyinyaroiase 


c a nn^A/t 
oAvjUSU4 


ZUU 


conserved nypomeucai protein 


oACjU^UD 


yl 


uin A-omomg proiein nu ^ 


o a /"*n<n/c 




nypotneticai proiein 


bAOlOU/ 


31U 


cunydrooroiaie aenyurogenase /v 


c a nncno 


411 


oeta- lactam resistance iacior 


c A rincno 


AfYX 
4UJ 


Deia-iaciam resistance iacior 


oAvjUDIU 


4U0 


murivi proiein, puuiu ve 


C A i~Zf\Z 1 1 

oAvjUD 1 1 


z/u 


nyoroiase, naioacia aenaiogenose-uive loinny 


cj* a /"in cio 


43 o 


tiu domam protein l 




1Z5 


conserved nypoineticai protein 


oAUUj 14 


oV4 


f«><* ■% tm »i #v A " 1 " Dora 1/ 1 1h O finfml\r 

cauon-transporung /\ 1 i^ase, n i -n^ loiiuiy 


SAGOS 15 


286 


conserved hvnothetical nrotein 


SAG0516 


643 


fructose- 1 ,6-bisphosphatase, putative 


SAG0517 


374 


iron-sulfur cluster-binding protein, putative 


SAG0518 


NA 


peptide chain release factor 2, programmed frameshift 


SAGOS 19 


230 


ceil division ABC transporter, ATP-binding protein FtsE 


SAG0520 


309 


cell division ABC transporter, permease protein FtsX 


SAG0521 


236 


carboxymethylenebutenolidase-related protein 


SAG0522 


232 


metallo-beta-lactamase superfamily protein 
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Table 1: C mplete list of GBS predicted genes 



ORF 


Size 
(a.a.) 


Annotation 


SAG0523 


254 


oxidoreductase, short chain dehydrogenase/reductase family 


SAG0524 


835 


DNA polymerase III, epsilon subunit/ATP-dependent hencase 
DinG 


SAG0525 


397 


aspartate aminotransferase 


SAG0526 


448 


asparaginyl-tRNA synthetase \ 


SAG0527 


185 


conserved hypothetical protein 


SAG0528 


327 


inosine-uridine preferring nucleoside hydrolase 


SAG0529 


38 


hypothetical protein . 


SAG0530 


137 


OsmC/Ohr family protein 


SAG0531 


296 


conserved hypothetical protein 


SAG0532 


324 


conserved hypothetical protein * 


SAG0533 


303 


conserved hypothetical protein 


SAG0534 


465 


dipeptidase 


SAG0535 


506 


zinc ABC transporter, zinc-binding adhesion liprotein 


SAG0536 


86 


ribosomal protein L31 


SAG0537 


3ll 


DHH family protein 


SAG0538 


340 


adenosine deaminase, putative 


SAG0539 


147 


flavodoxin 


SAG0540 


91 


chorismate mutase, putative 


SAG0541 


398 


voltage-gated chloride channel family protein 


SAG0542 


127 


IS1381, transposase QrfA 


SAG0543 


129 


IS1381, transposase OrfB 


SAG0544 


115 


ribosomal protein LI 9 


SAG0545 


359 


prophage LambdaSal, site-specific recombinase, phage integrase 
family 


SAG0546 


67 


conserved domain protein 


SAG0547 


185 


hypothetical protein 


SAG0548 


265 


prophage LambdaSal, repressor protein, putative 


SAG0549 


1 47 


hypothetical protein 


SAG0550 


74 


conserved hypothetical protein 


SAG0551 


52 


conserved hypothetical protein 


SAG0552 


62 


hypothetical protein 


SAG0553 


268 


hypothetical protein 


SAG0554 


63 


prophage LambdaSal, transcriptional regulator, Cro/CI family 


SAG0555 


249 


prophage LambdaSal, antirepressor, putative 


SAG0556 


47 


hypothetical protein 


SAG0557 


76 


hypothetical protein 


SAG0558 


i 74 


hypothetical protein 


SAG0559 


286 


conserved hypothetical protein 


SAG0560 


77 


conserved hypothetical protein 


SAG0561 


46 


hypothetical protein 


SAG0562 


84 


hypothetical protein 


SAG0563 


53 


hypothetical protein 


SAG0564 


160 


conserved hypothetical protein 


SAG0565 


224 


conserved domain protein 


SAG0566 


138 


prophage LambdaSal, single-strand binding protein 


SAG0567 


439 


prophage LambdaSal, reverse transcriptase/maturase family 
protein 
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Table 1: Complete list of GBS predicted genes 



ORF 


Size 
(a.a.) 


Ann tation 


SAGOS 68 


67 


conserved hypothetical protein 


SAGOS 69 


158 


conserved hypothetical protein 


SAG0570 


115 


hypothetical protein 


SAG0571 


43 


hypothetical protein 


SAG0572 


■too 

138 


conserved hypothetical protein 


SAG0573 


54 


hypothetical protein 


SAG0574 


89 


conserved hypothetical protein 


SAG0575 


110 


hypothetical protein 


SAG0576 


43 


hypothetical protein 


SAG0577 


177 


conserved hypothetical protein 


SAG0578 


88 


conserved hypothetical protein 


SAG0579 


142 


conserved hypothetical protein 


SAGOS 80 


111 


conserved hypothetical protein, truncation 


SAGOS 81 


118 


conserved hypothetical protein 


SAGOS 82 


422 


conserved hypothetical protein 


SAG0583 


406 


conserved hypothetical protein 


SAG0584 


62 


conserved hypothetical protein, truncation 


SAGOS 85 


All 


conserved hypothetical protein 


SAG0586 


154 


conserved hypothetical protein 


SAG0587 


300 


prophage LambdaSal, structural protein, putative 


SAG0588 


71 


conserved hypothetical protein 


SAG0589 


143 


conserved hypothetical protein 


SAG0590 


112 


conserved hypothetical protein 


SAG0591 


78 


conserved hypothetical protein 


SAG0592 


111 


conserved hypothetical protein 


SAG0593 


185 


prophage LambdaSal, structural protein 


SAG0594 


81 


conserved hypothetical protein 


SAG0595 


123 


conserved hypothetical protein 


SAG0596 


670 


prophage LambdaSal, pblA protein, mternal deletion 


SAG0597 


506 


jgrophage LambdaSal, mmor structural protein, putative 


SAG0598 


1374 


prophage LambdaSal, N-acetylmuramoyl-L-alamne amidase, 
family 4 , 


SAG0599 


/•/•A 

668 


prophage LambdaSal, minor structural protein, putative 


SAG0600 


1 AA 

109 


hypothetical protem ■ 


oAGOoul 


70 


hypothetical protem 


bAG0oU2 


100 


conserved hypothetical protem 


C A /~*A/T AO 

SAG06U3 


111 

111 


conserved hypothetical protein 


O A 

oACj0604 


239 


prophage LambdaSal, lysin, putative 


O A PA/CAC 

2SAOUOU5 


*JA<5 

323 


conserved hypothetical protein 


SAG0606 


66 


conserved hypothetical protein 


oAVJUOU / 


JO 


conserved nypoxneticai protein 


SAG0608 


59 


hypothetical protein 


SAG0609 


1 NA 


prophage LambdaSal, integrase, degenerate 


SAG0610 


134 


conserved hypothetical protein 


SAG0611 


NA 


transposase, degenerate 


SAG0612 


53 


conserved hypothetical protein 


SAG0613 


425 


transmembrane protein Vexp 1 


SAG0614 


218 


ABC transporter, ATP-binding protein Vexp2 
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able 1: Complete list of GBS predicted g nes 



ORF 


Size 


Annotation 




(a.a.) 




SAG0615 


Arc* 

458 


transmembrane protein Vexp3 


SAG0616 


217 


DNA-binding response regulator VncR 


SAG0617 


439 


sensor hisUdine kinase VncS 


SAG0618 


195 


transposase OrfB, IS3 family, truncation 


SAG0619 


66 


conserved hypothetical protein 


SAG0620 


62 


hypothetical protein 


SAG0621 


401 


rod shape-determining protein RodA, putative □ 


SAG0622 


186 


1 J 1 11 ■till 1 *1 £* - ■ 

hydrolase, haloacid dehalogenase-hke family 


SAG0623 


650 


DNA gyrase, B subumt 


SAG0624 


574 


septation ring formation regulator EzrA, putative 


SAG0625 


213 


phosphoserine phosphatase SerB 


SAG0626 


I6l 


MutT/nudix family protein 


SAG0627 


I5l 


conserved hypothetical protein 


SAG0628 


435 


enolase 


SAG0629 


354 


conserved domain protein j 


SAG0630 


427 


^ t 1 1 *• • a! 1 * 1 . /> 

3-phosphoshikimate 1 -carboxyvmy Itransferase 


SAG0631 


170 


shikimate kinase 


SAG0632 


457 


psr protein 


SAG0633 


451 


RNA methyltransferase, TrmA family 


SAG0634 


70 


hypothetical protein 


SAG0635 


245 


acid phosphatase, class B 


SAG0636 


172 


conserved hypothetical protein 


SAG0637 


NA 


transcriptional regulator, TetR family, putative, authentic 






frameshift 


SAG0638 


109 


cell wall surface anchor family protein, truncation 


SAG0639 


273 


transposase OrfB, IS3 family 


SAG0640 


91 


transposase OrfA, IS3 family 


SAG0641 


NA 


Tn5252, Orf 10 protein, degenerate 


SAG0642 


59 


1 i1 A* 1 A " 

hypothetical protein 


SAG0643 


"V ¥ A 

NA 


t • *\ 1 TV J . 

chaperonnx, 33 kDa, degenerate 


CI A yt/\>r" A A 

SAG0644 


402 


transcriptional regulator, AraC family 


Cl a A c 

SAG0645 


554 


cell wall surface anchor family protein 


CI A f\ f A f* 

SAG0646 


307 


ii ii i "i • 
cell wall surface anchor family protem 


CI A f\S" A T 

SAG0647 


305 


sortase family protein 


SAG0648 


260 


sortase family protein 


SAG0649 


890 


It ii c i_ r» «i . • a. a.* 

cell wall surface anchor family protein, putative 


CI A f~*>f\£Z C t\ 

SAG0650 


189 


sortase family protem 


SAG0651 


201 


A * A? 1 /* . » 

protem of unknown function 


SAG0652 


XT A 

NA 


Tn5252, Orf 28 protem, degenerate 


SAG0653 


Tk.T A 

NA 


J 1 a! A.* 1 A. • J A. 

conserved hypothetical protem, degenerate 






hypothetical protein 


SAG0655 


57 


conserved hypothetical protein 


SAG0656 


36 


hypothetical protein 


SAG0657 


89 


hypothetical protein 


SAG0658 


383 


lipoprotein, putative 


SAG0659 


330 


ABC transporter, ATP-binding protein 


SAG0660 


272 


membrane protein 


SAG0661 


261 


conserved hypothetical protein 
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'able 1: C mplete list of GBS predicted genes 



ORF 


biz 


Ann tation 




1U1 


— 

cylX protein 


g a ar\&Ai 

o/\vJUDOj 




fill mtm. 

cyiLr protein 


0/WJU004 


ZHU 


cylG protein 


OAVjUOuD 


1 n 1 


acyi carrier protein Acpv^ 


oAuUDOO 


1 « 


cylZ protein 


o AvjUoo / 


5\)y 


cylA protein 


oAvjUOOo 


zyz 


cylB protein 




00/ 


cylE protein 


Q A r±(\£H(\ 
oAvjUO/1/ 


31 / 


cyir protein 


oAuUO / 1 


I5\ 


cyll protein 


bAvJUO/2 


4U3 


cylJ protein 




1 01 

iyi 


cylK protein 


G A nAX7/l 

b AOUo /4 




hypothetical protein 


g a /"iA£*7^ 


1/1 


putative secreted protein 


bAOUo /O 


oor 

oo!> 


proteinase, putative 


b AOUo / / 


luoZ 


hypothetical protein 


bACJUo/o 


JMA 


endopeptidase O, degenerate 


bAOUo/y 


*5 A *J 

343 


protein of unknown function 


SACjOooU 


339 


protein 01 unknown junction 


bAOUosl 


353 


conserved domain protein 


c a nn^oo 
oAuUooz 




permease, putative 


G A /^in/coi 
oAOUoo3 


"XT A 
IMA 


transmembrane protein Vexp3, putative, degenerate 


G A r^CitLQA 

oAuUoo4 


223 


ABC transporter, ATP-binding protein 


oAvjUOoj 


4/2 


conserved hypothetical protein 


G A nr\£Q£L 
oAvjUooO 


2ol 


DNA-entry nuclease, putative 


oALjUOo / 


212 


DedA family protein, putative 


oAVJUOoo 


Zlo 


AbL transporter, A 1 r-binamg protem 


0AVJUO07 


ZD / 


membrane protein, putative 


g a nn^on 


2/2 


conserved hypothetical protein 


g a rzn&Q 1 


ZV4 


transcriptional regulator, LysR family 


G A rJOfiQ') 
o/WJU02*Z 


iyj 


regulatory protein, putative 


SAG0693 


377 


IS 1548, transposase 


G A f~lC\KQA 


1 /J 


regulatory protein, putative, truncation 


C A rifl^Q^ 
bAvjUOiO 


33U 


D-lactate dehydrogenase 


g a nn^QA 


DID 


sodium : gal ac to side symporter family protein, putative 


G A 


341 


2-keto-3-deoxy gluconate kinase 


g a oa^qq 


CQQ 


beta-glucuronidase 




ZZ3 


transcriptional regulator, vjntK iamny 


g a nn7fin 

oAvjU/UU 


one 
ZIO 


2-dehydro-3-deoxyphosphogluconate aldolase/4-hydroxy-2- 
oxogiuxaraxe aiaoiase 


SAG0701 


466 


cyliiPiirATiAtp isnmprfl<jp 

gl Uv Ul v JU Cllv lOUlllbl CLOW 


SAG0702 


348 


mannonate dehydratase 


SAG0703 


279 


D-mannonate oxidoreductase 


SAG0704 


! 270 


hydrolase, haloacid dehalogenase-like family 


SAG0705 


596 


glycosyl hydrolase, family 3 


SAG0706 


361 


proline dipeptidase 


SAG0707 


334 


transcriptional regulator, RegM family 


SAG0708 


488 


alpha amylase family protein 
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ORF 


oize 
(a.a.) 


Ann tati n 


C A fl A7 AO 

oAwu/Uy 


j3Z 


glycosyl transferase, group 1 family protein 


ijAUU / 1 U 


/I /I /I 


glycosyl transferase, group 1 family protein 


Q Ai^A71 1 
OAUU / 1 1 


£A7 
OH- / 


inreonyi-iKJN a syninetase 


q a r*A7io 
OAUU / 1 Z 


71A 


DNA-binding response regulator 


OAUU/1 J 


TIG 
33y 


conserved hypothetical protein 


oAUU/14 


loo 


conserved hypothetical protein 


Q Ai^A71 ^ 
OAUU /ID 


OK 
ZlO 


amino acid ABC transporter, permease protein 


OAUU/ lO 


ZjI 


amino acid ABC transporter, permease protein 


oAUU / 1 / 


OA£ 
ZOO 


amino acid ABC transporter, amino acid-bindingjprotein 


o AUU / 1 o 


O^l 
ZD1 


amino acid ABC transporter, ATP-binding protein 




ZJO 


DNA-binding response regulator 


o A fW77 A 
OAUU /ZU 


44y 


sensory box histidine kinase 


C A /^1A70 1 

oAuu/Zl 


zoy 


metallo-beta-lactamase superfamily protein 


oAuu /ZZ 


1 oo 

1ZZ 


conserved hypothetical protein 


oAuU /Z3 


Z3o 


ribonuclease m 


oAOU /Z4 


1 1 on 

ii/y 


chromosome segregation SMC protein 


C* A CI A70 c 

oAuU /ZD 


ZoD 


hydrolase, haloacid dehalogenase-like family 


oAuu /Zo 


z/4 


hydrolase, haloacid dehalogenase-hke family 


oAuU /z / 


536 


signal recogmtion particle-docking protein FtsY 


oAuU /Zo 


OOA 

270 


ABC transporter, substrate-binding protein 


oAuU/Zy 


3UU 


ABC transporter, permease protein, putative 


C A /^AT^ A 

oAuU/3U 


/IO 
4Z 


ABC transporter, ATP-bmding protein 


q a CXfYTl 1 

oAUU /3 1 


34/ 


bacterial lucif erase family protein 


O AUU / 3Z 


OOA 

/zu 


transcriptional accessory protein Tex, putative 


oAUU /33 


14Z 


conserved hypothetical protein 


OAUU / J4 


Q7 

0/ 


phage shock protein C, putative 


OAUU/JD 


A/1 

44 


hypothetical protein 


OAUU/ JO 


111 

1 


rUT^oer) Kinase/phosphatase 


oAVJU/j / 


7^7 
ZD / 


prolipoprotein diacylglyceryl transferase 


QAHA7^5 


13Z 


conserved hypothetical protein 


c AriA7io 

OAVJU/J7 


1 A1 

1*0 


conserved hypothetical protein 


Q A OH7AO 


Q1 

71 


conserved hypothetical protein 


OAUU/ 1-1 


1A1 
DUj 


peptidase, U32 family, putative 


o AnH7A9 
OAUU /*rZ 


AO ft 
H-Zo 


peptidase, U32 family 


OAUU / *r J 


7A 
/U 


A AM fl AM FA/1 \+ T» r ui- _n_ W*mm j i_ i j _n_ it. ^ Mfc. aj^A^ S 

conservea nypotnetical protein 




OA-* 
ZOD 


membrane protein, putative 


OAUU / *tJ 


*HfO 


ivinz-r/rez+ transporter, JNKAMJr tamily 


o A 0074^ 
OAVJU / *tO 


Doy 


riDotiavin oiosyntnesis protein KibJJ 


^A^tH7A7 


OAQ 
ZUo 


riboflavin synthase, alpha subunit 


OAUU /Ho 


107 


riboflavin biosynthesis protein RibA 


SAG0749 


156 


nhftfl Avin ovntVinc^ faf*fn en in it 
iiUUUdYUl ojrllLilaoC, UCla oUUUxlXL 


SAG0750 


496 


lysyl-tRNA synthetase 


SAG0751 


300 


hydrolase, haloacid dehalogenase-like family 


SAG0752 


213 


phosphoglycerate mutase family protein 


SAG0753 


157 


ebsC family protein, putative 


SAG0754 


205 


conserved domain protein 


SAG0755 


282 


peptidase, U32 family 


SAG0756 


174 


conserved hypothetical protein 
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Table 1; Complete list of GBS predicted genes 




ORF 


Size 
(a.a.) 


Annotation 


SAG0757 ! 


129 


protein of unknown function/hpoprotein, putative j 


SAG0758 


599 


ohgoendopeptidase F, putative 


SAG0759 


931 


phosphoenolpyruvate carboxylase 


SAG0760 


377 


IS 1548, transposase 


SAG0761 


422 


cell division protein, FtsW/RodA/SpoVE family \ 


SAG0762 


398 


translation elongation factor Tu 


SAG0763 


252 


triosephosphate isomerase 


SAG0764 


230 


phosphoglycerate mutase family protein 


SAG0765 


681 


penicillin-binding protein 2b 


SAG0766 


198 


recombination protein RecR 


SAG0767 


348 


D-alanine-D-alanine ligase 


SAG0768 


455 


UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-diaminopi 
D-alanyl-D-alanyl ligase 


SAG0769 


406 


oxalate:formate antiporter 


SAG0770 


228 


membrane protein, putative 


SAG0771 


512 


cell wall surface anchor family protein 


SAG0772 


514 


peptide chain release factor 3 


SAG0773 


126 


conserved hypothetical protein 


SAG0774 


244 


ABC transporter, ATP-binding protein 


SAG0775 


220 


ABC transporter, permease protein 


SAG0776 


276 


YaeC family protein, putative 


SAG0777 


528 


ATP-dependent RNA helicase, DEAD/DEAH box family 


SAG0778 


88 


conserved hypothetical protein 


SAG0779 


254 


conserved hypothetical protein 


SAG0780 


246 


acyltransferase family protein 


SAG0781 


217 


competence protein CelA 


SAG0782 


745 


DNA internalization-related competence protein ComEC/Rec2 ! 


SAG0783 


! 269 


hydrolase, haloacid dehalogenase-hke family 


. SAG0784 


\ 314 


sugar-binding transcnptional regulator, LacI family 


SAG0785 


330 


J t At A* 1 A * 

conserved hypothetical protein 


SAG0786 


242 


J J • A 

conserved domain protein 


SAG0787 


345 


DNA polymerase III, delta subunit, putative 


SAG0788 


202 


superoxide dismutase, Fe-Mn 


SAG0789 


283 


a • a* 1 a* a • « y • rri 

transcnptional antitermmator LicT 


SAG0790 


622 


PTS system, beta-glucosides-specific IIABC components 


SAG0791 


475 


6-phospho-beta-glucosidase 


SAG0792 


364 


Jl a1 A* 1 A* ! 

conserved hypothetical protein 


SAG0793 


380 


glycerate kinase 2 


SAG0794 


418 


permease, GntP family 


SAG0795 


354 


J * a1 A* 1 A. • 

conserved hypothetical protein 




14/ 


transcriptional regulator, MarR family 


SAG0797 


L 342 


S-adenosylmetWonine:tRNAribosylti^sferase-isomerase 


SAG0798 


226 


membrane protein, putative 


SAG0799 


! 233 


glucosamine-6-phosphate isomerase 


. SAG0800 


318 


glutathione S-transferase family protein 


SAG0801 


239 


ribosomal small subunit pseudouridine synthase A 


SAG0802 


38 


hypothetical protein 


SAG0803 


383 


major facilitator family protein 
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able 1: Complete list of GBS predicted genes 



ORF 




Annotation 


DAVJU5U4 




competence protein lx>ia 


QA^ARA^ 1 


Ol/l 


OUgOcnQOpcpilQaSc JO 


c a oar a a 


9 OR 


nyaroiase, naioacia aenaiogenase-»iiKe iamny 




9^ 


w-meinyitransierase iamny protein 


QAOARAQ 




protease maturation protein, putative 




1 ^1 
101 


conserved nypotneticai protein 


oAUUolU 


R99 
5 / Z 


aianyi-tKJNA synthetase 


oAuUo 1 1 


9**R 


memDrane protein, putative 


oAvjUo 1Z 


9*79 
Z /Z 


giycosyi transierase, iamny o 


OAuUo 1 D 


RI 
51 


hypothetical protein 




20 


conserved hypothetical protein 


oAVjU5 1 D 


*71 
/ 1 


transcriptional regulator, Cro/CI family 


q a rjnc 1 < 


ZD^ 


membrane protein, putative 


oAIjUo 1 / 


io/ 


membrane protein, putative 


oAuUalo 




ribonucleoside-diphosphate reductase 2, beta subunit 




*71 O 


ribonucleoside-diphosphate reductase 2, alpha subunit 


oAvjUoZU 


HA 
74 


ribonucleoside-diphosphate reductase 2, NrdH-redoxin 


Q A nAQ0 1 


0*7 


phosphocarrier protein HPr 


oACjUozz 


C7*7 
3// 


phosphoenolpyruvate-protein phosphotransferase 


oAOUozJ 


4/3 


glyceraldehyde-3-phosphate dehydrogenase, NADP-dependent | 


c A rino/i 
oAUUo-64 


ill *7 

41 / 


polysaccharide deacetylase family protein 


oAVjUoZD 




Alr-dependent KiMA nelicase, utLAu/ut,Ari box tamily 


oAVJUoZO 


OAQ 

zuy 


uridine kinase 


oALrU5Z / 


lOD 


conserved hypothetical protein 


q a nncoo 
oAuUoZo 


^ZA 


DNA polymerase III, gamma and tau subunits 


CAnnfioo 
oAvjUoZy 


54 


conserved hypothetical protein 


QAfiftMA 
oAIjUojU 


311 


biotin— acetyl-CoA-carboxylase ligase 


OAUUOJ 1 


IOC 


o-aaenosyimetnionine syntnetase 


QAfrAR^9 


/DJ 


protem oi unKnown runcuon 


QAfiftft^^ 

OAuUOj j 


1 91 
151 


nypo tneii c ai protem 


OaVJUOj't 


A9 
4Z 


nypotneticai protein 




1 88 
155 


conserved nypotneticai protem 


OAUuOjO 


1 8A 
154 


conserved hypothetical protein 


uAVJUOJ / 


A98 
4Zo 


adl transporter, a i r^-Dinaing protem 


OAUUOjo 


9^1 


nypomeucai protem 


OAVJuOJ 2r 


99A 
ZZO 


transcnpuonai regulator, i en/v iamny 




ZDj 


pnospnomexnyipynmiaine Kinase 


<5AO0R41 

0/\.VJ\JO*T 1 


9^£ 

ZOO 


nyoroxyetnyiuiiazoie Kinase 


SIAO0R49 


991 
ZZJ 


wiamine-pnospnate pyropnospnoiyiase 






uur-iN -acetyl glucosamine i -carooxyvmy itransierase 


SAG0844 


184 


ClV'VVjr 1 LI til lOJLW'l UOVj vlli/i 1 ACUillljf 


SAG0845 


427 


CBS domain protein 


SAG0846 


286 


methionine aminopeptidase, type I 


SAG0847 


306 


ribonuclease BN, putative 


SAG0848 


151 


GtrA family protein 


SAG0849 


169 


conserved hypothetical protein 


SAG0850 


652 


DNA ligase, NAD-dependent 


SAG0851 


339 


bmrU protein, putative 
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ORF 


Size 
(a._) 


Annotation 


_VAl_jUo_>2 


/OO 


— — . 

pullulanase, putative s 


__Avjvo53 


022 


1,4-alpha-glucan branching enzyme 


C ! A PAQfyl 

__>A<jUo54 




glucose- 1 -phosphate adenylyltransferase 


C? A PAOCC 

oACj0o55 


XT A 

JNA 


glycogen biosynthesis protein GlgD, authentic frameshift 


C A _T_Q££ 


4/0 


glycogen synthase 


oAvjOoD / 


_£_£ 
oo 


ATP synthase F0, C subunit 


c* a /"•r.oco 
oAOUo5o 


238 


A * ■ * ■ Jl"! _f\ A __ ^ • __ 

ATP synthase F0, A subunit 


C A nAOCA 


1 _CC 

lo5 


A ' h ^ ■ i __._ - _ _ a. _ _ ^ V*_TV __.__.__.____ _ _ 

ATP synthase F0, B subunit 


• c A /~*nOicn 
^ACjUooO 


1 /o 


A 'I'll ____. ________ _____ — « __ "V - * _ J __ t____ ______ • __ 

ATP synthase Fl, delta subunit 


C A PAO^I 

oAOUool 


501 


ATP synthase Fl, alpha subunit 






ATP synthase Fl, gamma subunit 


_yAOUoo3 


/( __TO 

468 


ATP synthase Fl, beta subunit 


SAl__rUoo4 


137 


a ' i *_n_ _ _ T » _!•__"- i 

A1P synthase Fl, epsilon subumt 


Ci A r<AO^C 

oACjU865 


76 


conserved hypothetical protein 


SAG0866 


423 


T TT"*_iT> T_ T a 1 1 * < •_ • « . dm 

UDP-N-acetylglucosamme 1 -carboxyviny Itransferase 


SAG0867 


63 


conserved hypothetical protein 


SAG0868 


285 


DNA-entry nuclease 


SAG0869 


346 


phenylalanyl-tRNA synthetase, alpha subunit 


SAG0870 


173 


acetyltransferase, GNAT family 


SAG0871 


801 


phenylalanyl-tKNA synthetase, beta subunit 


SAG0872 


300 


conserved hypothetical protein 


SAG0873 


1077 


exonuclease RexB 


a Ann i 

SAG0874 


1207 


exonuclease RexA 


SAG0875 


305 


magnesium transporter, CorA family ,jputative 


SAG0876 


458 


tRNA modification GTPase TnxiE 


SAG0877 


636 


ABC transporter, ATP-binding protein 


SAG0878 


1 322 


acetoin dehydrogenase, thymine PPi dependent, El component, 
alpha subunit 


SAG0879 


332 


acetoin dehydrogenase, thymine PPi dependent, El component, 
beta subunit 


O A r'AO OA 

SAG0880 


462 


acetoin dehydrogenase, thymine PPi dependent, E2 component, 
dihydrohpoamide acetyltransferase 


O A /__T_001 


cat" 

585 


acetom dehydrogenase, thymine PPi dependent, E3 component, 
dihydrohpoamide dehydrogenase 


oAVjvooZ 


329 


lipoate-protein ligase A 


oAvjUooj 


251 


cobyric acid synthase, putative 


oAvjUoo4 


447 


mur ligase family protem 


c Ann__i_^ 
oAvjUooj 


283 


conserved hypothetical protein TIGR00159 


oAvjUooO 


319 


protem or unknown function 


oAvjUoo/ 


! 450 


phosphoglucomutase/phosphomannomutase family protein I 


SAG0888 




conservea nypomencai protem 


SAG0889 


126 


conserved hypotheticaljprotein 


SAG0890 


376 


oxygen-independent copropotphyrinogen III oxidase, putative 


SAG0891 


245 


conserved hypotheticaljprotein 


SAG0892 


256 


hydrolase, haloacid dehalogenase-like family 


SAG0893 


218 


conserved hypothetical protein 


SAG0894 


137Q 


protein of unknown function 


SAG0895 


289 


lipoyl-binding domain protein 
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SAG0896 


108 


oxidoreductase, putative 


SAG0897 


221 


conserved hypothetical protein 


SAG0898 


83 


hypothetical protein 


SAG0899 


57 


hypothetical protein 


SAG0900 


56 


hypothetical protein 


SAG0901 


127 


hypothetical protein 


SAG0902 


45 


hypothetical protein 


SAG0903 


44 


hypothetical protein 


SAG0904 


56 


hypothetical protein 


SAG0905 


138 


nucleoside diphosphate kinase 


SAG0906 


610 


GTP-binding protein LepA 


SAG0907 


877 


protein of unknown function/lipoprotein, putative 


SAG0908 


203 


HD domain protein 


SAG0909 


154 


acetyltransferase, GNAT family 


SAG0910 


144 


PilB-related protein 


SAG0911 


930 


cation-transporting ATPase, E1-E2 family 


SAG0912 


367 


nucleoside diphosphate kinase domain protein 


SAG0913 


212 


chloramphenicol acetyltransferase 


SAG0914 


203 


conserved hypothetical protein 


SAG0915 


405 


Tn916, transposase 


SAG0916 


67 


Tn916, excisionase 


SAG0917 


83 


Tn9 16, hypothetical protein 


SAG0918 


76 


Tn916, hypothetical protein 


SAG0919 


157 


Tn916, hypothetical protein 


SAG0920 


23 


Tn916, hypothetical protein 


SAG0921 


117 


Tn916, transcriptional regulator, putative 


SAG0922 


61 


Tn9 1 6, hypothetical protein j 


SAG0923 


639 


Tn9 1 6, tetracycline resistance protein j 


SAG0924 


28 


Tn916, tetM leader peptide 


SAG0925 


310 


Tn9 1 6, hypothetical protein 


SAG0926 


333 


Tn916, NLP/P60 family protein 


SAG0927 


725 


membrane protein, putative 


SAG0928 


NA 


Tn916, hypothetical protein, authentic frameshilt 


SAG0929 


168 


Tn916, hypothetical protein 


SAG0930 


165 


Tn916, hypothetical protein 


SAG0931 


73 


Tn916, hypothetical protein 


SAG0932 


401 


Tn916, transcnptional regulator, putative 


SAG0933 


461 


Tn916, FtsK/SpoIIIE family protein 


SAG0934 


i 128 


Tn916, hypothetical protein 


SAG0935 


104 


Tn916, hypothetical protem 






inyio, nypotneticai protein 


SAG0937 


NA 


ABC transporter, ATP-binding protein, authentic frameshift 


SAG0938 


! 122 


transcriptional regulator, GntR family 


SAG0939 


1034 


DNA polymerase III, alpha subunit 


SAG0940 


340 


6-phosphofructokinase 


SAG0941 


500 


pyruvate kinase 


SAG0942 


185 


signal peptidase I, putative 


SAG0943 


47 


hypothetical protein 
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SAG0944 


604 


glucosamine-fhictose-6-phosphate aminotransferase, isomerizing 


SAG0945 


377 


IS 1548, transposase 


SAG0946 


109 


phnA protein 


SAG0947 


213 


amino acid ABC transporter, permease protein 


SAG0948. 


209 


amino acid ABC transporter, ATP-binding protein 


SAG0949 


276 


amino acid ABC transporter, amino acid-binding protein 


SAG0950 


82 


ribosomal protein S20 


SAG0951 


306 


pantothenate kinase 


SAG0952 


196 


conserved hypothetical protein 


SAG0953 


129 


cytidine deaminase 


SAG0954 


. 349 


protein of unknown function/lipoprotein, putative 


SAG0955 


511 


sugar ABC transporter, ATP-binding protein 


SAG0956 


353 


sugar ABC transporter, permease protein, putative 


SAG0957 


318 


sugar ABC transporter, permease protein, putative 


SAG0958 [ 


456 


NADH oxidase 


SAG0959 


329 


L-lactate dehydrogenase 


SAG0960 


819 


DNA gyrase, A subunit 


SAG0961 


247 


sortase SrtA 


SAG0962 


137 


glyoxylase family protein 


SAG0963 


320 


conserved hypothetical protein 


SAG0964 


375 


Na+/H+ exchanger family protein 


SAG0965 


127 


IS1381, transposase OrfA 


SAG0966 


129 


IS1381, transposase OrfB 


SAG0967 


520 


GMP synthase 


SAG0968 


232 


transcriptional regulator, GntR family 


SAG0969 


444 


gid protein 


SAG0970 


247 


acetyltransferase, GNAT family 


SAG0971 


282 


protein of unknown function/lipoprotein, putative 


SAG0972 


NA 


conserved hypothetical protein, authentic firameshift 


SAG0973 


320 


nisin-resistance protein, putative 


SAG0974 


250 


ABC transporter, ATP-binding protein 


SAG0975 


651 


ABC transporter, permease protein, putative 


SAG0976 


222 


DNA-binding response regulator 


SAG0977 


312 


sensor histidine kinase 


SAG0978 


356 


site-specific recombinase, phage integrase family 


SAG0979 


553 


ABC transporter, substrate-binding protein 


SAG0980 


257 


conserved hypothetical protein 


SAG0981 


228 


satD protein 


SAG0982 


521 


signal recognition particle protein Ffh 


SAG0983 


110 


conserved hypothetical protein 


SAG0984 


A 

437 


sensor histidme kinase CiaH 


SAG09°5 


226 


DNA-binding response regulator CiaR j 


SAG0986 


849 


aminopeptidase N 


SAG0987 


217 


phosphate transport system regulatory protein PhoU ! 


SAG0988 


252 


phosphate ABC transporter, ATP-binding protein PstB, putative 


SAG0989 


267 


phosphate ABC transporter, ATP-binding protein PstB, putative ! 


SAG0990 


295 


phosphate ABC transporter, permease protein PstA, putative | 


SAG0991 


1 305 


phosphate ABC transporter, permease protein 
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SAG0992 


286 


phosphate ABC transporter, phosphate-binding protein 


SAG0993 


436 


NOLl/NOP2/sun family protein 


SAG0994 


254 


inositol monophosphatase family protein 


SAG0995 


93 


conserved hypothetical protein 


SAG0996 


137 


conserved hypothetical protein 


SAG0997 


310 


macrolide-efflux protein mreA/riboflavin biosynthesis protein 
RibF 


SAG0998 


294 


tRNA pseudoundine synthase B 


SAG0999 


143 


acetyltransferase, GNAT family j 


SAG1000 


423 


conserved hypothetical protein 


SAG1001 


196 


conserved hypothetical protein 


SAG1002 


292 


protease, putative 


SAG1003 


876 


permease, putative 


SAG1004 


233 


ABC transporter, ATP-binding protein 


SAG1005 


706 


DNA topoisomerase I 


SAG1006 


280 


DprA/SMF protein, putative DNA processing factor 


SAG1007 


342 


iron-compound ABC transporter, iron-compound-binding protein 


SAG1008 


253 


iron compound ABC transporter, ATP-binding protein 


SAG1009 


324 


iron compound ABC transporter, permease protein 


SAG1010 


320 


iron compound ABC transporter, permease protein 


SAG1011 


182 


acetyltransferase, CysE/LacA/LpxA/NodL family 


SAG1012 


253 


ribonuclease H3I 


SAG1013 


283 


GTP-binding protein 


SAG1014 


190 


conserved hypothetical protein 


SAG1015 


494 


carbon starvation protein CstA, putative 


SAG1016 


244 


response regulator 


SAG1017 


579 


sensor histidine kinase, putative - 


SAG1018 


40 


lipoprotein, putative 


SAG1019 


39 


hypothetical protein 


SAG 1020 


227 


lipoprotein, putative 


SAG1021 


107 


hypothetical protein 


SAG1 022 


177 


hypothetical protein 


SAG 1023 


48 


hypothetical protein 


SAG1 024 


183 


lipoprotein, putative 


SAG1025 


149 


hypothetical protein 


SAG1026 


\t A 

NA 


immunogenic secreted protein, degenerate 


CI A /**» 1 AT7 

SAG1027 


84 


conserved hypothetical protein 


Ci a /-<» i /v* o 

SAG 1028 


1 C4V* 

196 


hypothetical protem 


SAG 1029 


I0l 


hypothetical protein 1 


SAG1030 


304 


protein of unknown function 




ion 


conserved domain protein 


SAG1032 


85 


conserved hypothetical protein 


SAG1033 


1309 


FtsK/SpoIHE family protein 


SAG1034 


55 


hypothetical protein 


SAG1035 


424 


conserved hypothetical protein 


SAG1036 


80 


conserved hypothetical protein 


SAG1037 


157 


hypothetical protein 


SAG1038 


1003 


phage infection protein, putative 
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able 1: Complete list f GBS predicted genes 



ORF 


Size 
(a. a.) 


Annotation 


SAG1039 


96 


conserved hypothetical protein 


SAG1040 


260 


conserved domain protein 


SAG1041 


107 


hypothetical protein 


SAG1042 


1060 


carbamoyl-phosphate synthase, large subunit 


SAG1043 


358 


carbamoyl-phosphate synthase, small subunit 


SAG1044 


307 


aspartate carbamoyltransferase 


SAG1045 


430 


dihydroorotase, multifunctional complex type 


SAG1046 


209 


orotate nhosnhoribosvltransferase 


SAG1047 


233 


orotidine 5'-ohosnhate decarboxvlase 


SAG1048 - 


410 


membrane orotein. outative 


SAG1049 


513 


ABC transnorter ATP-bindin& nrotein 


SAG1050 


112 


ribonucleotide reductase truncation 


SAG1051 


358 


asDartate-semialdehvde dehvdro^enase 

*wl/M* VMVV UVlllllAlMVll T VftW XAV11 J VAA VClvIXUOV 


SAG1052 


47 


cell wall surface anchor familv nrotein nntativt* 


SAG1053 


30 


hvoothetical orotein 

**y t-f W U*V * w UJL 1/ JL U twill 


SAG1054 


531 


cardiolinin svnthetase 


SAG1055 


556 


formate— tetrah vdrnfh 1 ate 1 \ <rase 

X\JXXXX*AWs lWU.CUJ.Jr VUUlUlClLw Al£^ClOW 


SAG1056 


339 


liooate-nrotein lipase A 

U|A/llUi lJX\Jl.\sXXX ilgUow x\. 


SAG1057 


292 


conserved hvnofh^ticjil nrotfMn 


SAG1058 


272 


conserved hvoothetical nrotein 


SAG1059 


110 


ulvcine cleavage svstem H nrotein nntativp 


SAG1060 


328 


bacterial luciferase familv orotein 

WfcWVWJl**** lUvUvllUW 1U11111 y uivlvlll 


SAG1061 


399 


oxidoreductase FMN-bindinff 

VfllWVI WV* V* V tJ^^y M. X VAX T t/lllUlll^ 


SAG1062 


282 


liDoate-nrotein lipase A familv nrotein 


SAG1063 


228 


flavonrotein-related orotein 


SAG1064 


180 


flavoprotein family protein 


SAG1065 


190 


membrane nrotein. nutative 


SAG1066 


572 


Dhosohoelucomutase 


SAG1067 


178 


IS861, transposase OrfA 


SAG1068 


1 277 


IS861, transposase OrfB 


SAG1069 


65 


hypothetical protein 


SAG1070 


577 


ABC transporter, ATP-binding/permease protein 


SAG1071 


573 


ABC transnorter. AXP-bindinff/nermease orotein 


SAG1072 


200 


conserved hypothetical protein 


SAG1073 


325 


conserved hypothetical protein 


SAG1074 


418 


serine hydroxymethyltransferase 


SAG 1075 


183 


Sua5/YciO/YrdC/YwlC family protein 


SAG1076 


276 


modification methvlase HemK familv 

aaa v ^«*aa>a w*«a-w aa. aaaw*»aa^ .AtP^Wwj A IWili-LX IMAlill J 


SAG1077 


359 


peptide chain release factor 1 


SAG1078 


189 


thymidine kinases 


SAG1079 


60 


4-oxalocrotonate tautomerase 


SAG1080 


47 


hypothetical protein 


SAG1081 


312 


ApbE family protein 


SAG1082 


200 


conserved hypothetical protein 


SAG1083 


411 


conserved hypothetical protein 


SAG1084 


262 


formate/nitrite transporter family protein 


SAG1085 


424 


xanthine permease 


SAG1086 


193 


xanthine phosphoribosyltransferase 
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able 1: C mplete list of GBS predicted genes 



ORF 


Size 
(a.a.) 


Ann ota tinn 


SAG1087 


327 


guanosine monophosphate reductase 


SAG1088 


446 


drug resistance transporter, EmrB/QacA family, putative 


SAG1089 


230 


conserved hypothetical protein j 


SAG1090 


666 


potassium uptake protein, putative 

— — r i- z ^ 


SAG1091 


216 


oxidoreductase, short chain dehydrogenase/reductase family 


SAG1092 


330 


phosphate acetyltransferase 


SAG1093 


294 


ribosomal large subunit pseudouridine synthase RluD subfamilv 


SAG1094 


278 


conserved hypothetical protein 


SAG1095 


223 


GTP pyrophosphokinase family protein 


SAG1096 


190 


conserved hypothetical protein 


SAG1097 


324 


ribose-phosphate pyrophosphokinase 


SAG1098 


371 


cysteine desulphurase 


SAG1099 


115 


conserved hypothetical protein j 


SAG1100 


210 


conserved hypothetical protein 


SAG1101 


226 


DNA repair protein RadC 


SAG1102 


377 


membrane nrotein nutative 


SAG1103 


478 


6-iVh o <i r>Hn-hf* ta - o] 1 1 r r* <;i rl acp 


SAG1 104 


204 


■platelet acrivatitio ■fa^tfYi* TOitsifivi* 


SAG1105 


273 


nVflrola^f halnafiH Hf* Vial no'PtifiQp-lil^'f* "fatnilv 


SAG1 106 


309 


transcriritinrial reoiilatrvr AraO familv raitntivi* 


SAG1107 


510 


voltage.-gated chloride channel family protein j 


SAG1 108 


357 


cr>PrtYI 1 H 1 T> F»/rvi 1 trf*QP i n ARP francnnrtpr cnprmi'Hinp/nntrpcr , i'no 
JJC 1 1 111 U JLLIC/ JJ U.LL Co L> 11 1C ADu LI CULlo|J\Jl IC1 ? o LJC1 1111 U. ill C/ 1JQU Cov-lllC" 

binding nrotein 


SAG1 109 


258 


SDermidine/nutrescine ABC tran snorter nermpa^e nrntpm 

w|/VllUiUUlvf pull VDVillw <XJJw H. m 1 i)U\Jl LwJL , UwllUvCluV UiUlvlU 


SAG1110 


264 


SDermidine/nutrescine ABC transnnrter nermease nrntein 


SAG1111 


384 


SDermidine/nutrescine ARC! transnnrter ATTP-Vnndino nrnfpin 


SAG1112 


300 


UDP-N-acetvlenoln vruvovl glucosamine reductase 


SAG1113 


162 


2-aniino^-hvdroxv-6-hvdroxvmelhvldihvdronteridine 
pyrophosphokinase 


SAG1114 


120 


dihydroneopterin aldolase 


SAG1115 


267 


dihydropteroate synthase 


SAG1116 


187 


GTP cyclohydrolase I 


SAG1117 


420 


folylpolyglutamate synthase 


SAG1118 


295 


rarD protein 


SAG1119 


288 


homoserine kinase 


SAG1120 


427 


homoserine dehydrogenase 


SAG1121 


295 


polysaccharide deacetylase family protein 


SAG1122 


515 


transporter, BCCT family protein 


SAG1123 


34 


hypothetical protein 


SAG1 124 


458 


aldehyde dehydrogenase family protein 


SAG1125 


335 


membrane protein, putative 


SAG1126 


228 


protein of unknown function 


SAG1127 


446 


conserved domain protein 


SAG1128 


65 


transcriptional regulator, Cro/CI family 


SAG1129 


36 


hypothetical protein 


SAG1130 


49 


hypothetical protein 


SAG1131 


164 


thiol peroxidase 


SAG1132 


219 


conserved hypothetical protein 



24 



& o q-dw^s t* * g s h & o 2 

able 1: Complete list of GBS predicted genes 





Size ! 
(a* a*) 


Annotation 


SAG1133 

unvi x x «/ -J 


254 


Cfiii^RrvftH hvnnthetieal nrotein 

wliOCl VvU Xljr JJUUxwllWCU |yx VJlWXXX 


SAG 1134 


213 


tran^crintional retTiilator GrvfR fanriilv/notacQirYnnri imtnV^ r*rritp»i« 
ix cuioisX x^/Liwxxcix iwguiaLui 5 vjuu\ x<uxxxiy/j^uuiooiuixxxx uputivc ljiulciii, 

TrkA familv 

* * <X***XXXX_Y 


SAG1135 


183 


fils24 nrotein nutative 


SAG1136 


65 


conserved hvnothetical nrotein 


SAG1137 


180 


ds24 nrotein nutative 


SAG1138 


64 


conserved hvnothetical nrotein 

WlUvl Y VV1 AX V pV/Ulv 11VCU L/XVSXV'XXX 


SAG1139 


193 


conserved hvnothetical nrotein 

VUUiiUl VbU XXjr lyVJUX^slllsCxX lyXv/LwAXX 


SAGl 140 


82 


conserved hvnnthetieal nrotein 

vi/iuvi v w xx jT iy uui v* iiucu lyxv/Lwxxx 


SAGl 141 

LTXVi X X »^ X 


112 


conserved hvnothetical nrotein 

wviiuvi » x/vx ixjr tyv/ixxwu\scxx tyxvsx-wxxx 


SAGl 142 

k_rx x. \j x x ■ a* 


759 


ATTP-denendent ON A helicase Per A 

* X X A UW|yv^XXUwXXx X-^l li* llvllwUOw X wX xV 


SAGl 143 

UXXVJ X X f 


128 


conserved Hvnothetical nrotein 

VUllOvl V lAj iyV/UlV»UwCM LJlvrlKsXXX 


SAGl 144 

UilvJ X X 1 ■ 


441 


uracil nermea^e 

uxcxvxx ly vixiivicidv 


SAGl 145 


448 


oodiiim # a1antni» cvmnortpr "familv nrotein 

Ovli4tUXl s <U€UlilxC djr llxJJUx 1471 xqllUxjr LJIULCIII 


SAGl 146 ! 


411 

x x 


cation f»"FFli iy "fiirnilv nrnfpi n 
WaixVsll vxxxUA. xCUliXl jr L/xULCxll 


SAGl 147 

U/xvJ X X / 


130 


pninprvp^ hvnothi^tical nrot<*in 

wilovl VvU xiyjJv'LxxvLxVlU LJIULCIII 


SAGl 148 


231 


mf*m Vvrane nrotein tyi itatix/^ 

X11C111 Ul CHIC piULClll, JJULdllVC 


SAGl 1 49 


707 


ll^JUJpjUJLCill, pii I a L1VC 


SAGl 150 


400 


rihoQomal tviwt#*in 1 
xlUUoVJiilCll ljiihciii L3 1 


SAGl 151 


76 


vUlldCl VCLX lljrpvJUlCLlL/Cll pruLCiu 


SAGl 152 


340 


branched -chain Amino acid siminotrflncfi^T'fic^ 
UiaxJVllwU^xxcXlll ClllllIHJ CtLvlLx CUllUlUllcUIoiCxa.£>C 


SAGl 1 53 

Uixvl X X 


819 

017 


±J±^it\. LULJUlo\JlllClCxoC 1 Yj r\ olxUlxxlxl 


SAGl 154 


653 


TYMA ton oi Qntnpra Qf* T"V cnlMinit 

X/liix lUJJWlovrAXxGX CXOW X V ? J_> DLXL/LllXll 


SAGl 155 


212 


memViranp nrotf»in mitative 

XXX^XXXI^XCLXXw I^X WlV/XXlj JJIXLCLLI V ^ 


SAGl 156 


217 

X. X / 


uracil— rVM A ctIvcoqvIjkjp 


SAGl 157 


161 


conserved fivnotliptical nrotein 


SAG1158 


413 


C!lvfP-N-acetv1neiiramintc acid Qvntheta^c NphA 

V^XVXX ™li"awijr XXXWUXCUllllXXls av/iu ojrxxlxxvtAow liwvux 


SAGl 159 


209 


nenTY nrotein 

11W IXX^ LyXVXwXXX 


SAGl 160 

A A A ^^^^ 


384 


IJDP-N-acetvl f? lucosamine-2-eni merase NenC 


SAGl 161 

WX X^»«F X A A 


341 


N-acetvl nenramic acid Qvntheta^e NenR 

X^l CXV^wUjr X XXv*UXCIXXXlw Cxx/X\X OJr XXUXVvCXO^ livULI 


SAGl 162 

k^X X X 


466 


nolvsaccliaride riiosvnthesi^ nrotein f^n^T 

J^»V/XJr OCXW/XICXX XUw l>WOj UUI VJlJ |<7X VI LV^XXX V^J^FOX-# 


SAGl 163 


318 


nolvsaccharide hiowntHesis nrotein Cn^TCrVi 


SAGl 164 


321 


elvcosvl transferase CnsJrVi 


SAGl 165 


327 


elvcosvl transferase CnsOr\A 

E)I T V\/kl j X U UlliJXvKW V V^L/JVi'l T / 


SAGl 166 


295 


fflvcosvl transferase CDs±NJf\A 


SAGl 167 


241 


nolvsaccharide biosvnthesis nrotein CnsTvfrV^ 


SAGl 168 


364 


nolvsaccharide biosvnthesis nrotein cn«il-rr\A 


SAGl 169 

V_/ 4. A>^»J A A ^ 


163 


fflvcosvl transferase GnsG^Vi 


SAGl 170 


149 


nolvsaccharide hiosvnthesis nrotein Cn<?P 


SAGl 171 

kx^A As^^ A A # A 


462 


fflvcosvl transferase CnsE 


SAGl 172 


229 


cpsD protein 


SAGl 173 


230 


cpsC protein 


SAGl 174 


243 


capsular polysaccharide biosynthesis protein CpsB 


SAGl 175 


485 


capsular polysaccharide biosynthesis protein CpsA 


SAGl 176 


290 


transcriptional regulator, LysR family, putative 


SAGl 177 


255 


conserved hypothetical protein 


SAGl 178 


236 


purine nucleoside phosphorylase 


SAGl 179 


418 


voltage-gated chloride channel family protein, putative 
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SAG1180 


269 


purine nucleoside phosphorylase 


SAG1181 


135 


arsenate reductase 


SAG1182 


403 


phosphopentomutase 


SAG1183 


223 


ribose 5-phosphate isomerase 


SAG1184 


236 


conserved hypothetical protein 


SAG1185 


262 


tributyrin esterase 


SAG1186 


553 


metallo-beta-lactamase suoerfamilv nrotein 


SAG1187 


253 


ABC transporter, ATP-binding protein 


SAG1188 


287 


ABC transporter, permease protein 


SAG1189 


334 


conserved hvnothetical nrotein 


SAG1190 


551 


adherence and virulence nrotein A 


SAG1191 


239 


alnha-acetolactate decarboxylase 

*** * ■ ^^^^ WViMVVUIiV U VVU4 v V AT AlAwV 


SAG1192 


L 560 


acetolactate svnthase catabolic 


SAG1193 


408 


TPR domain nrotein 


SAG1194 


396 


membrane nrotein nutative 

***V1AAI/A IAAAV IV/JLAA^ I7UUAU V \-/ 


SAG1195 


153 


IMutT/nudix familv nrotein 

lUUll/UUUlA 1CU 1 UIJ |J1 VS IVslll 


SAG1196 


160 


mutator MutT nrotein 


SAG1197 


1072 


hvaluroni d ase 


SAG 1198 


348 


dTDP-t?lucose 4 6-dehvHrata^e 


SAG1199 


197 


dTDP-4-deliv£lrorbamnn<;e ^ S-enimf*r»Q<* 

A J-^A 1 Vl^llJT 1X1. VSll 1CU1 JUL J ) J vpilUVl CIO V 


SAG1200 


289 


ffhlCO^e-1 -nbo^nbate tVivmiHvlvltran^fipriicp* 


SAG1201 


367 


iminodi acetate oxidase nutative 


SAG1202 


262 


conserved hvnothetical nrotein TTCTR004R6 


SAG1203 


227 


conserved hvnothetical nrotein 

wV/i*wN*i. T Wvt 11 Jr p\/ 111 v 11 VCU r I will 


SAG1204 


226 


DNA renlication nrotein DnaO nutative 


SAG1205 


172 


adenine nhosnhoribosvltransferase 


SAG1206 


i 854 


conserved domain nrotein 

^*^^A*b»WA w VU MV/1UUUJ. |/lVliVlll 


SAG1207 


32 


hvnothetical nrotein 


SAG1208 


\ 732 


sin&le-stranded-DNA-snecific exo nuclease RecJ 


SAG1209 


253 


oxidoreductase* short chain dehvdropenase/reductase familv 


SAG1210 


309 


metallo-beta-lactamase sunerfemilv nrotein 


SAG1211 


' 215 


conserved hvnothetical nrotein 


SAG1212 


412 


GTP-bindine nrotein HflX 


SAG1213 


296 


tRNA deltaf2Visooentenvlnvroohosnhate transferase 

» * ^ AW VI/VAAVVAAj I |/_J A vf/AAWW|/AAUiV U UAAUA VA AAlJW 


SAG1214 


58 


hypothetical protein 


SAG1215 


305 


exfoliative toxin A, putative 


SAG1216 


1252 


pullulanase, putative 


SAG1217 


NA 


conserved hvnothetical nrotein authentic frameshift 

WV»*»*W* ▼ W> 11 J p vUiVllViii ^/Av/IVlllj UUUlvllllw 1-1. m 1 IviJIIIIv 


SAG1218 


194 


conserved hvnothetical nrotein 


SAG1219 


468 


nentidase M20/M25/M40 familv 


SAG1220 


200 


nitroreductase family protein 


SAG1221 


NA 


glycerophosphoryl diester phosphodiesterase, putative, authentic 
point mutation 


SAG1222 


593 


excinuclease ABC, C subunit 


SAG1223 


255 


conserved hypothetical protein 


SAG1224 


446 


MATE efflux family protein 


SAG1225 


136 


conserved hypothetical protein 


SAG1226 


165 


conserved hypothetical protein 
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SAG1227 


198 


protein of unknown function 


SAG1228 


96 


ISSdyl, transposase OrfA 


SAG1229 


259 


ISSdyl, transposase OrfB 


SAG1230 


96 


conserved hypothetical protein 


SAG1231 


NA 


transposase OrfB, IS3 family, degenerate 


SAG1232 


77 


transposase OrfB, IS3 family, truncation 


SAG1233 


822 


streptococcal histidine triad family protein 


SAG1234 


306 


laminin-binding surface protein 


SAG1235 


425 


GBSil, group II intron, maturase 


SAG1236 


NA 


C5a peptidase, authentic firameshift 


SAG1237 


444 


hypothetical protein 


SAG1238 


202 


hypothetical protein 


SAG1239 


76 


conserved hypothetical protein 


SAG1240 


125 


conserved hypothetical protein, truncation 


SAG1241 


91 


transposase OrfA, IS3 family 


SAG1242 


67 


transposase OrfB. IS3 familv truncation 


SAG1243 


96 


ISSdvl, transnosase OrfA 


SAG1244 


259 


ISSdyl, transposase OrfB 


SAG1245 


38 


hypothetical protein 


SAG1246 


389 


hypothetical protein 


SAG1247 


399 


site-specific recombinase nha^e inteoraQe fiamilv 


SAG1248 


75 


conserved hypothetical protein 


SAG1249 


74 


transcriptional regulator. Cro/CI familv 


SAG1250 


621 


Tn5252, relaxase 


SAG1251 


121 


Tn5252, Orf 9 protein 


SAG1252 


120 


Tn5252, Orf 1 0 protein 


SAG1253 


435 


transposase, ISL3 family 


SAG1254 


546 


mercuric reductase 


SAG1255 


130 


mercuric resistance operon regulatory protein MerR 


SAG1256 


142 


IS861, transposase OrfB, truncation 


SAG1257 


709 


cation-transporting ATPase, E 1 -E2 family 


SAG1258 


122 


cadmium efflux system accessory protein 


SAG1259 


99 


conserved hypothetical protein 


SAG1260 


262 


hypothetical protein 


SAG1261 


198 


conserved hypothetical protein 


SAG1262 


695 


cation-transporting ATPase, E1-E2 family 


SAG1263 


NA 


conserved domain protein, authentic frameshift 


SAG1264 


148 


transcriptional repressor CopY, putative 


SAG1265 


206 


cadmium resistance transporter, putative 


SAG1266 


152 


hypothetical protein 


SAG1267 


108 


hypothetical protein 


SAG1268 


230 


repressor protein, putative 


SAG1269 


44 


hypothetical protein 


SAG1270 


471 


ImpB/MucB/SamB family protein 


SAG1271 


116 


conserved hypothetical protein 


SAG1272 


102 


conserved hypothetical protein 


SAG1273 


118 


conserved hypothetical protein 


SAG1274 


129 


conserved hypothetical protein 
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Ann tatinn 


SAG1275 


75 


hypothetical protein 


SAG1276 


358 


conserved hypothetical protein 


SAG1277 


163 


hypothetical protein 


SAG1278 


96 


hypothetical protein 


SAG1279 


99 


conserved domain protein 1 


SAG1280 


2274 


SNF2 family protein 


SAG1281 


183 


hypothetical protein 


SAG1282 


63 


calcium-binding protein, putative 


SAG1283 


1631 


agglutinin receptor 


SAG1284 


196 


abortive infection protein AbiGI 


SAG1285 


281 


abortive infection protein AbiGII 


SAG1286 


933 


Tn5252, Or£28 


SAG1287 


776 


Tn5252, Or£26 


SAG1288 


NA 


Tn5252, Or£25, degenerate 


SAG1289 


284 


Tn5252, Or£23 


SAG1290 


80 


hypothetical protein 


SAG1291 


605 


Tn5252, Orf 21 protein, internal deletion 


SAG1292 


162 


hypothetical protein 


SAG1293 


194 


protease, putative 


SAG1294 


77 


conserved hypothetical protein 


SAG1295 


127 


conserved hypothetical protein 


SAG1296 


142 


conserved hypothetical protein 


SAG1297 


451 


C-5 cytosine-specific DNA methylase 


SAG1298 


31 


hypothetical protein 


SAG1299 


272 


conserved hypothetical protein 


SAG1300 


57 


conserved hypothetical protein 


SAG1301 


121 


ribosomal protein L7/L12 


SAG1302 


166 


ribosomal protein L10 


SAG1303 


702 


A 1 F-dependent Clp protease. ATP-bindine subunit 


SAG1304 


32 


hypothetical protein 


SAG1305 


314 


homocysteine S-methyltransferase MmuM, putative 


SAG1306 


458 


amino acid permease 


SAG1307 


216 


hypothetical protein 


SAG1308 


167 


hypothetical protein 


SAG1309 


30 


hypothetical protein 


SAG1310 


182 


transcriptional regulator,- TetR familv i 


SAG1311 


198 


Gl F-binding protein 


SAG1312 


408 


ATP-dependent Clp protease, ATP-binding subunit ClpX 


SAG1313 


56 


conserved hypothetical protein 


SAG1314 


164 


dihydrofolate reductase 


SAG1315 


279 


thymidylate synthase 


SAG1316 


390 


HMG-CoA synthase 


SAG1317 


427 


3-hydroxy-3-methylglutaryl-CoA reductase 


SAG1318 


149 


conserved hypothetical protein 


SAG1319 


214 


hemolysin III, putative 


SAG1320 


304 


conserved hypothetical protein TIGR00147 


SAG1321 


284 


glutathione S-transferase family protein, putative 


SAG1322 


72 


conserved domain protein 
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Annni^iti n 


SAG1323 


331 


isopentenyl-diphosphate delta-isomerase 


SAG1324 


330 


phosphomevalonate kinase 


SAG1325 


314 


diphosphomevalonate decarboxylase 


SAG1326 


292 


mevalonate kinase, putative 


SAG1327 


409 


sensor histidine kinase 


SAG1328 


228 


DNA-binding resDon.se reeulator 


SAG1329 


208 


OTP Dvroohosohokinase familv nrotein 


SAG1330 


68 


hvDOthetical nrotein \ 


SAG1331 


979 


R5 protein 

-± r AUlWAil 


SAG 1332 


146 


transcrintional regulator MafR familv rmtative 


SAG1333 


690 


5 '-nucleotidase familv nrotein 


SAG1334 


136 


Dolvoentide defbrrnvlase nutative 


SAG1335 


449 


NADP-SDecific t?lutamate deh vdroopnase 


SAG1336 


169 


membrane nrotein nutative 

******** W* VUXWr |/A\/IWllIj pUlHU TV 


SAG1337 


589 


ADC! tran snorter ATP-hinriiTiCT/nerm^ac^ nmtAin 

■* »-*-»v^ i* miiw pvi iwi 5 fin idUiiviiiigr wllll CaoC \J± \J LC11I 


SAG1338 


579 


ABC transporter A' 1 P-hinrlina/nf»rrrii»»c<* rtmfpm 


SAG1339 


157 


acetvltransferase GNAT familv 


SAG1340 


622 


ABC transnorter A TP-hinrlino nrnteiri 

*J-»v^ uuiitj^vllVl) All UlllVBlllg JJlVJLClll 


SAG1341 


402 


nolvA no lvm erase familv nrnt^in 


SAG1342 


282 


DeeV familv nrotein 


SAG1343 


126 


nrotein of unknown function 


SAG1344 


177 


hvnothetical nrotein 


SAG1345 


164 


conserved nomothetical nrotein 

vvruoks* v vsVA 11J p vUJLvUvfll JJAvLwlll 


SAG1346 


654 


PTS svstem fructose sneeific TTAPLC pnmnnrtAntc 


SAG1347 


303 


1 -nhosohofiructokinase 


SAG1348 


247 


lactose nhosnhotransferase svstem renrp^^nr 


SAG1349 


411 


beta-lactam resistance factor 

WV*** ***WM>1 1,J AVOlOUiMtVV XM-NsVVS* 


SAG1350 


544 


surface antigen-related protein 


SAG1351 


307 


2-dehydropantoate 2-reductase, putative 


SAG1352 


" 356 


reeulatorv nrotein. nutative 


SAG1353 


330 


pyridine nucleotide-disulDhide oxidoreduetase familv nmtein 


SAG1354 


251 


tRNA (euanine-Nl Vmethvltransferase 


SAG1355 


172 


16S rRNA processing protein RimM 


SAG1356 


503 


transcrintional reeulator RofA familv 


SAG1357 


80 


KH domain protein 


SAG1358 


90 


ribosomal protein S16 


SAG1359 


415 


permease, putative > 


SAG1360 


236 


ABC transporter. ATP-bindine nrotein 


SAG1361 


414 


conserved hypothetical protein 


SAG1362 


532 


carbamovl-nhosnhate svnthase larae suhunit nutative 


SAG1363 


356 


carbamoyl-phosphate synthase, small subunit 


SAG1364 


173 


pyrimidine operon regulatory protein 


SAG1365 


296 


ribosomal large subunit pseudouridine synthase, RluD subfamily 


SAG1366 


154 


lipoprotein signal peptidase 


SAG1367 


301 


transcriptional regulator, LysR family 


SAG1368 


94 


ribosomal protein L27 


SAG1369 


112 


conserved hypothetical protein 


SAG1370 


104 


ribosomal protein L21 
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ble 1: Complete list of GBS predicted genes 



ORF 


Size 
(a.a.) 


Annotati n 


SAG1371 


392 


conserved hypothetical protein ] 


SAG1372 


404 


thiamine biosynthesis protein Thil 


SAG1373 


381 


cysteine desulphurase 


SAG1374 


150 


conserved hypothetical protein 


SAG1375 


449 


glutathione reductase 


SAG1376 


111 


conserved hypothetical protein 


SAG1377 


388 


chorismate synthase 


SAG1378 


355 


3-dehydroquinate synthase 


SAG1379 


225 


3-dehydroquinate dehydratase 


SAG1380 


385 


conserved hypothetical protein 


SAG1381 


714 


sulfatase 


SAG1382 


119 


ribosomal protein L20 


SAG1383 


66 


ribosomal protein L35 


SAG1384 


176 


translation initiation factor IF-3 


SAG1385 


227 


cvtidvlate kinase 


SAG1386 


174 


conserved hvnothetical nrotein 


SAG1387 


65 


ferredoxirL 4Fe-4S 


SAG1388 


163 


conserved hvnothetical nrotein 


SAG1389 


406 


peptidase T 


SAG1390 


544 


nolvsaccharide biosvnthe^i^ nrotein nntativ<» 


SAG1391 


484 


TJ DP-N-acetvl muram ovlalanvl -D- cr 1 1 i m a t^— 9 f^-Hi:3rrurirvr*irr»i=»i otA 
li&ase 


SAG1392 


264 


iron compound ABC transporter. ATP-bindine nrotein 


SAG1393 


310 


iron compound ABC transnorter <jiihstTatp-r>indina nmtein 


SAG1394 


341 


iron comnound ABC transnorter nermease nrotein 


SAG1395 


333 


iron compound ABC transnorter nermease nrotein 


SAG1396 


217 


conserved hypothetical protein 


SAG1397 


311 


inorganic pyrophosphatase, manganese-dependent 


SAG1398 


262 


pyruvate formate-lyase-activating enzyme 


SAG1399 


444 


CBS domain protein 


SAG1400 


188 


conserved hypothetical protein 


SAG1401 


311 


conserved hypothetical protein TIGR01212 


SAG1402 


213 


PAP2 family protein 

j v » _ , ______ 


SAG1403 


194 


membrane protein, putative 


SAG1404 


308 


cell wall surface anchor family protein 


SAG1405 


294 


sortase family protein 


SAG1406 


293 


sortase family protein 


SAG1407 


705 


cell wall surface anchor family protein 


SAG1408 


901 


cell wall surface anchor family protein 


SAG1409 


NA 


rogB protein, authentic frameshift 


SAG1410 


379 


glycosyl transferase, group 1 family protein 


SAG1411 


282 


glycosyl transferase, group 2 family protein 


SAG1412 


474 


polysaccharide biosynthesis protein 


SAG1413 


454 


membrane protein, putative 


SAG1414 


308 


glycosyl transferase, group 2 family protein 


SAG1415 


311 


glycosyl transferase, group 2 family protein 


SAG1416 


352 


nucleotide sugar dehydratase, putative 


SAG1417 


240 


nucleotidyl transferase, putative 
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able 1: Complete list f GBS predicted genes 



ORF 


Size 
(a.a.) 


Ann tation 


SAG1418 


274 


polysaccharide biosynthesis protein. DUtati ve 


SAG1419 


577 


lipoprotein putative 


SAG1420 


117 


conserved hypothetical protein 


SAG1421 


243 


clvcosvl transferase, crouo 2 family protein 


SAG1422 


313 


clvcosvl transferase stoud 2 family protein 


SAG1423 ! 


384 


clvcosvl transferase putative 


SAG1424 


284 


dTDP^-dehvdrorhamnose reductase 


SAG1425 


113 


conserved hvnothetical protein 


SAG1426 


369 


RNA polymerase siffma-70 factor ! 


SAG1427 


602 


DNA orimase 


SAG1428 


125 


iarce conductance mechanosensitive channel nrotein 

Ilu O WllUUWlttUVV lllWVUUllV/lJWll J1U V v vl IHI II tvl }JX VJ IVvXXX 


SAG1429 


58 


ribosomal nrotein 1 


SAG1430 


167 


conserved hvnothetical nrotein 


SAG1431 


268 


amino acid ARf! transnorter amino ar»iH-hinHina niYitfin 

Mxxxxxxvr uwiu ruJV ix CXI li>J^J V-/x Ivl ^ CU xxlxxv/ CIVjIvx L/illvxlll^ |JxvJLCUx 


SAG1432 


347 


ammonium tr^n 'snnrtpr TJirnilv T>rotpin 

fcixxxxxxwxxi txxxx ucujo^jisx ivl iXlxxxxxjr Jjx \J LwlJLx 


SAG1433 


375 


conserved hvnothetical nrotein 


SAG1434 


328 


rhodanese family protein j 


SAG1435 


101 

X V X 


conserved Vivo fif"Vif*tif* 5*1 itiwt^iYi < 


SAG1436 


457 




SAG1437 


55 


hvnfithf^iiPfil i^tYi n 

lxjr JJU 1 LixVLlV^dl JJIUICIJUL 


SAG1438 


754 


Piveofxen nhoQnhr^rvlac^* 

tzrJ wgvll J^IIV/O^IxVjx jr iclovv 


SAG1439 


498 


4—3 1 nh 7i - o n i f*nn n 1tji n q fi^rst qf* 

~ ClX|JxxCl ££1 Lll/CIx M\J U Olidl CI doV 


SAG1440 


342 


maltose nnpfftTi rPTYrf»^Qr*r lMall? mitalivp 
xAJLOxIVj^xs upciuu IUJJICooUI ivxctixv, pu let live 


SAG1441 


415 


TTl^ltn^p/rn altrirlpvtriri A T^f*^ trsiticnr\rt^f* m 1 +r\ c /m ^ 1 tr»H v-tri r» _ 
iiiaxLVSowiiiaiLiJUCAUxxJ. jT\jl>\^ UcUloptjU ICl, Hldl LUjC/ IIlclllUUCAlI 111" 

binding protein 


SAG1442 


456 


maltose ABC transnorter nennease nrotein 


SAG1443 


278 


maltose ABC transnorter nermease nrotein 


SAG1444 


490 


proton/peotide svmporter familv nrotein 


SAG1445 


! NA 


MutT/nudix familv nrotein authentic frameshift 


SAG1446 


62 


hypothetical protein 


SAG1447 


441 


conserved hypothetical protein 


SAG1448 


502 


clvcosvl transferase, group 1 familv nrotein 


SAG1449 


795 


pre protein translocase SecA subunit nutative 


SAG1450 


330 


conserved domain protein 


SAG1451 


494 


conserved hypothetical protein 


SAG1452 


514 


conserved hypothetical protein 


SAG1453 


409 


preprotein translocase SecY familv protein 


SAG1454 


398 


glycosyl transferase, putative 


SAG1455 


295 


slvcosvl transferase, eroun 2 familv nrotein 


SAG1456 


NA 


clvcosvl transferase, familv 8 degenerate 


SAG1457 


129 


IS 1381, transposase OrfB 


SAG1458 


127 


IS1381, transposase OrfA 


SAG1459 


413 


gjycosyLtransferase family 8 


SAG1460 


401 


glycosyl transferase, family 8 


SAG1461 


335 


conserved hypothetical protein 


SAG1462 


970 


cell wall surface anchor family protein 


SAG1463 


NA 


transcriptional regulator, RofA family, authentic point mutation 


SAG1464 


663 


excinuclease ABC, B subunit 
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able 1: C mplete list f GBS predicted genes 



ORF 


Size 
(a.a.) 


Annotation 


SAG1465 


O AiT 

306 


protease, putative 


o a 1 AHH 

SAG1466 


727 


glutamine ABC transporter, glutamine-binding protein/permease 
protein 


o AG 146/ 


Z40 


>vl<i4n«M<M/v A T)/^ + ,. ,-, - , ,-, ,-■ ,- , f -, ,- A "I'll l_ JXmm^ *>m A — ? - - X*t 1 ._ a. _ , - 

glutamine ABU transporter, Alr-binaing protein, GlnQ putative 


bAG1468 


1 16 


conserved hypothetical protein 


oAG1469 


52 


conserved hypothetical protein 


oAG1470 


437 


/*^» pn "i - * ■ - j * - - - -- — ■« — * - 'I'll 1 //^i. _ f*_ _ 

GTP-binding protein, GTPl/Obg family 


OAG1471 


42 


conserved hypothetical protein 


C A /Til AT*i 

oAG 14 72 


/Jin 

413 


aminopeptidase PepS 


c a m All 


1 

192 


cell wall surface anchor family protein 


O A CL\ AHA 

OAG1474 


/•OA 

680 


amidase family protein 


oAG1475 


240 


ribosomal small subunit pseudouridine synthase A 


bAG1476 


'"ton 

280 


oxidoreductase, aldo/keto reductase family 


oAG1477 


224 


nitroreductase family protem 


O A /"^ 1 /I TO 

SAG1478 


1 OA 

130 


lactoylglutathione lyase 


SAG 1479 


308 


glycosyl transferase, group 2 family protem 


n API/l OA 

SAG 1480 


462 


amino acid permease 


C* A Z"" 1 1/10 1 

SAG1481 


155 


SsrA-binding protem 


SAG 1482 


801 


exonbonuclease, VacB/Rnb family 


SAG1483 


78 


preprotein translocase, SecG subunit 


C* A /"* 1 A O >l 

SAG1484 


A O 

48 


ribosomal protein L33 


C API/IOC 

bAG1485 


1 OA 

389 


multi-drug resistance protein 


O A /II yf O/C 

25AG1486 


548 


membrane protein, putative 


C» A 1 A 0*7 

JSAG1487 


! 233 


ABC transporter, ATP binding protein 


oAG1488 


1 AC 

! 195 


dephospho-CoA kinase 


O A /"* 1 /I OA 

oAG1489 


IT) 

273 


formamidopyrimidine-DNA glycosylase 


o AG 1490 


282 


transcriptional regulator, MutR family 


c a rxt /ioi 
oAtjl4yi 


530 


hypothetical protein 


c a ni AQ1 
oAvf!4:/2 


58 


hypothetical protein 


q a m aq*\ 
o/\vji*tyj 


oo 


hypothetical protein 


AQA 


32 


hypothetical protein 


SAG1495 


81 


CAAX amino terminal protease family protein 


oAG1496 


1 1 A 
110 


hypothetical protein 


o AG 149/ 


37 


hypothetical protein 


c a m /iOQ 
oAG14Vo 


133 


hypothetical protein 




OAA 

299 


G 1 P-binding protem bra 


P API CAA 


132 


diacylglycerol kinase 


oAOlDUl 


1 /C1 

j 161 


conserved hypothetical protein TIGR00043 


oAG15U2 


268 


tetracenomycin polyketide synthesis O-methyltransferase TcmP, 
putative 






nypouicucai proiem 


SAG1504 


38 


hypothetical protein 


SAG1505 


158 


MutT/nudix family protein 


SAG 1S06 


267 


hypothetical protein 


SAG1507 


345 


PhoH family protein 


SAG1508 


590 


67 kDa Myosin-crossreactive streptococcal antigen 


SAG1509 


71 


conserved hypothetical protein 


SAG1510 


169 


peptide methionine sulfoxide reductase 
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able 1: Complete list of GBS predicted genes 



ORF 


Size 
(a. a.) 


AJIOUMtllvIl 


SAG1511 


284 


conserved hvnothetiral nrotein 


SAG1512 


185 


ribosome recvclincr factor 


SAG1513 


242 


uridvlate kinase 

IVlUUuw 


SAG1514 


226 


nentide ARC tranQnnrter ATP-hindino nrntpin 


SAG1515 1 


262 


nentide ARC fran snorter ATP-hindino nrntpin 

jpvpuuw iTJJV^ ULcXIlo|sUX IVX, rXli LJllllXlllf^ |^F1 U IVlll 


SAG1516 1 


255 


nentide Al^C tranQnnrrpr nermpa^p nrfvtpin 

^vpuuu LX£LLtO|Jl/l IV! , fJVllllVCiOV LJ1 Ulvlll 


SAG1517 


314 


nentidp ARC tranQnnrrpr nprm cp nrrvtpin 


SAG1518 


538 


npnti Hp ARC fran Qrtnrtpr npntidp— rtindiiicr nrntMn 
pvj^/uuv rxuv^ u. cuiopux id y pvp uuc" uuiuiiig protein 


SAG1519 


229 


rihosornal nrotein T.I 


SAG1520 


141 

XHr X 


ri FinQomal nrntpin T 1 1 

1 1 LJWotJ lllcll J^JI LI LCUUL Jul 1 


SAG1521 


388 


tran^nn^acjp TRIO familv rnitatfvR 

U. C*1X«9|J UJOOV) IOJv 1(111 1 tAjr , |J UM9.il VC 


SAG1522 


460 


tran^nortpr mainr fapilitatnr -Pa m i 1 v 
u cuxo LMji ivi , iiicyisi laviiiuiiiji idiiiuy 


SAG1523 


404 


nentida^p Jvf9n/A/f7S/M r 4n familv 


SAG1524 


294 


Iran <5pri nti rvnai rpoiilatrvr T vol? fnmilv 


SAG1525 


117 




SAG1526 


178 


IS86 1 , transposase Orf A 


SAG 1527 


277 


TQQ/^ 1 tritncnncocA Or-fR 
lOOUl, ITculSpOSaSc KJTlLj 


SAG 1528 


571 


C/iiuriMiiaLc Diriuing enzyiric 


SAG1529 


816 


FtsK/SpomE famUy protein 


SAfrl S^ft 

o/wjri djjxj 


9^7 

ZD/ 


peptidyl-prolyl cis-trans isomerase, cyclophilin-type 


Onvj 1 1 


977 


manganese ajdc transporter, permease protein 


SAOI 519 


93R 


manganese Aot transporter, A l F-bmaing protein 


SAOI 5H 




manganese /vt>L^ transporter, manganese-binding adhesion 


SAG1534 


215 


nuii-uepcnucm ucuiburipiiunaL regulator 


SAG1535 


229 


j i iic my i Liiiuaueiiu sine nucieosiQase/ o -aQenosyinomocy steme 
nue1eo«jida<!R 


SAG1536 


89 


conserved livnotVietipfil r\rr\tf*in 


SAG1537 

X^ X X * X § 


184 


T\^titT/Tiiidiv Tamil v nrr>t*»in * 

IVlUll/UUUlA lfUUlJjr \Jl w Lvlll 


SAG1538 


459 


Til jP«»N--ac.PTv1 crlllPrtQaTniTIP T^Vfnt^Vir^c-r»V»r^-r\7l«ic<* 
vi-*! i^i civ/wijr i^iuwioCUlllltv \jy 1U pi HJ a pilLll y labC 


SAG1539 


31 


hvootTietical nrotein 


SAG1540 


137 


conserved hvnotTietical nrotein 

vvuJvl v V^U 11 J Ulw UvOt Lyl V/ lvlll 


SAG1541 


125 


plvoxala^e familv nTnfpin 

gljr VACUOOv ICUlllljr ^JAUlvlU 


SAG1542 


318 


oxidoreducta^e Gfo/Tdh/MonA familv 

VA1UU1 wUUvlUuV) VJlVf 1UU/ 1V1Uvi\ icuiuiy 


SAG1543 


NA 


Conserved hvnothetical nrotein aiittipnfin -frampcViift 


SAG1544 


232 


gluconate 5-dehvdroeena^e nntativp 


SAG1545 


78 


conserved hvnothetical nrotein 


SAG1546 


82 


conserved hvnothetical nrotein 

Wlluvi V W\* XX Jr IXXw Uwlu |VX VI Ivl 11 


SAG1547 


166 


acetvltransferase GNAT familv 

uw i j i u audi wi oowy vjiini xcuuiijr 


SAG1548 


422 


clvcosvl transferase ffroun 9 familv nrntpin 

g*jr^v*5jri ti a 1 1 oivi cio w, j^l VtljJ ^ X all 111 y piUlCUl 


SAG1549 


127 


IS 1 3 8 1 , transposase Orf A 


SAG1550 


129 


IS 1381, transposase OrfB 


SAG1551 


67 


hypothetical protein | 


SAG1552 


719 


conserved hypothetical protein 


SAG1553 


477 


hypothetical protein 


SAG1554 


225 


hypothetical protein 


SAG1555 


231 


hypothetical protein 


SAG1556 


445 


branched-chain amino acid transport system II carrier protein 
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able 1: Complete list of GBS predicted genes 



ORF 


Size 
(a. a.) 


Ann tation 


oAvj155/ 


ODD 


meuiionyi-tKJN A syntneiase 


P Ani ceo ■ 


zyi 


tellurite resistance protein TehB 


O A /II CCA 




membrane protein, putative 


o A fil C*CA 

oAulDoU 


4U 


hypothetical protein 


G A/11 C£1 


A AC 
4U3 


F l a system, 11U component, putative 


OA/IK iCO 


OOA 

zoU 


conserved hypothetical protein 


c a m c/ca 


2/D 


exodeoxyribonuclease 


oAljrl 3o4 


1 1 o 
1 lo 


conserved hypothetical protein 


g a ni 


1 <Q 

IDo 


metnylated-iJJNA--protein-cysieine b-metnyitransierase 


G A fil C« 




D-isomer specific 2-hydroxyacid dehydrogenase family protein 


o a fit c.ai 


1 oo 


acetyltransterase, uNAl iamily 


G A n,i C/CO 

bAGl jo© 


XT A 

NA 


phosphoserine aminotransferase, authentic frameshift 




211 


copper homeostasis protein CutC, putative 


G A C*7A 


J4 


conserved hypothetical protein 


G A /"M C71 

SAG1571 


53 


hypothetical protein 


SAG1572 


287 


tetrapyrrole methylase family protein 


SAG1573 


1 AO 

108 


conserved hypothetical protein 


SAG1 574 


287 


DNA polymerase HI, delta prime subunit, putative 


SAG1575 


211 


thymidylate kinase 


SAG 1576 


267 


transposase, IS30 family, putative, truncation 


SAG1577 


219 


AcuB family protein 


SAG1578 


236 


branched-chain amino acid ABC transporter, ATP-binding protein 


SAG 15 79 


254 


branched-cham amino acid ABC transporter, ATP-bmding protem 


O A ^ 1 f OA 

5AG1580 


317 


branched-chain amino acid ABC transporter, permease protein. 


CI A /^» KOI 

oAG1581 


289 


branched-chain amino acid ABC transporter, permease protein 


C A /"** 1 COO 

!SAG15o2 


ooo 

388 


branched-chain amino acid ABC transporter, amino acid-binding 
protein 


QAA1 CQ1 

oAljrlDoj 


01 
ol 


conserved hypothetical protein 


CAfJI CO/f 


511 


ibi34o, transposase 


g a fit coc 


1 1 Q£ 

iyo 


ATP-dependent Clp protease, proteolytic subunit ClpP 


o/VuiJoO 




uracil phosphoribosyltransferase 


oAuIjo / 


Joy 


aminotransferase, class I 


Q A /"II COO 


1 CO 


RNA methyltransferase, TrmH family, group 2 


CAfii coo 


A<A 
4jU 


amino acia permease, putative 




44y 


potassium uptaice protem, i rK tamny 


CAfii CQ1 


A*7C 


canon uptaKe protem, i rK xamiiy 


QApJ CQO 


OJ 


conserved nypoinencai protein iiuixuuz/o i 




OAA 


riuosomai large suoumi pseuaounome syninase o 


G A fit ^.OA 


1 OA 

iy4 


conserved nypotneticai protem i iijivuuzo i 


CAfii CAC 




/% A.M A Af*V f A/1 111 A^V% A^l A A 1 i%iia4a«w 

conserved nypotneucai protem 


SAG1596 


246 


intePTa^p/r^rximViina^p i^Visiof^ intpora^e "fiamilv 

llllVglCUW I bvUlill/lUuOWj ^/UUElgW UlltglO'JV AC1I llllj 


SAG1597 


157 


CBS domain protein 


SAG1598 


173 


conserved hypothetical protein 


SAG1599 


324 


HAM1 protein 


SAG1600 


264 


glutamate racemase 


SAG1601 


79 


conserved hypothetical protein 


SAG1602 


180 


membrane protein, putative 


SAG1603 


173 


transcriptional regulator, biotin repressor family 
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able 1: Complete list of GBS predicted genes 







/\nnotanon 


C AHI AOA 


770 


rnern urane proxein, putative 


CAOI 60^ 


167 


tuuscrvca nypoxnexicai piutcm 


O/WJIOUO 


747 


jcviN/\ xnexnyixiansrerase, i iiiiri laimiy 


c A01 £07 


07 


acyipnospnaxase 


Of\VJ 1 OUo 


31 n 

311/ 


npoproxem, putative 


^AOI A00 


771 


amino acia /\o^ transporter, permease? proxein 


c Am ai n 
o/\VJ i o i u 




amino acia /\x3^ rjansporier, suubuaie-Dinaiiig proxein 


O/WJIOl 1 


4R£ 


aioiaase iarniiy proicin 


O/WJIOIZ 


160 


iiaJiscripnoii ciongaiion lactur vjre/\ 




ouu 


conservea nypoineticai proxein 


• o/WJ101*t 


ID/ 


aceiyiirariSTerase, vjin/vi iarriiiy 


O/WJ1013 




u L/r -in -ace tyiinuraniaie — alanine iigase 


O/tAJIOIO 


70<* 
ZUj 


conservcQ nypoxneiicai proxein 




^7 


nypoxneticai protein 


o/WJlOlO 


1 037 


oniz iamiiy protein 


o/\uioiy 


1T7 
Jit 


IS 1548, transposase 


oAu lOZU 




phosphoglycerate dehydrogenase-related protein 


q a m ^7 i 

oAlJlOZl 


inn 


primosomal protein Dnal 






conserved hypothetical protein 






conserved nypotnetical protein 1 iCjKUUz44 


o/WJlOZ** 


^ni 


sensor nisnaine Kinase i^sro 


Q Aril £%J<\ 
o/\U10Z3 


77Q 

zzy 


iyjN/\-Dinaing response regulator L^sriv 


O/WJTlOZO 


1 77 
III 


conserved hypothetical protein 




zyo 


heat shock protein HtpX 




1 ft4 
lOH 


iem/\ protein 


c AG1 £70 


717 


/tlli/^nna iv^T^-tl^^^j^^^ i4iifim/\t% i 

giucose-inniuitea ai vision protein o 


q Arii £in 




socuum transport i amity protein 


c AOI 631 


773 

ZZJ 


poLabMurn upiajve proiein, itk iainiiy, puiaiivc 


^IAG1637 


z /o 


cod ait Txanspon iarniiy proxein 


SAG1 633 


JJO 


adu Lransporier 3 /\ l x -Dinoing proxein 


<*Afr1634 


717 

Z1Z 


conserved nypoxneucai proxein 


O/xVJ 1 UJ _l 


407 


ouuiuiri-uicarDoxyiaie symporxer iarniiy proxein 


^AGI 636 


4^5 


Drancneu-cnain amino acia transport system 11 earner protein 


SAG1637 


351 


cuvonoi ucuYuiugciiaoC) zmc'Contaimng 


SAG1638 


730 


ADv* ixaiiapuiLcr, pcimcabc proiCUl 


SAG1630 


356 


/tjl>v_x ucuibporter, t\ i jt - umu mg proxein 


SAG1640 


4SR 


JpcpLlU.daC ? IVl^CW/ iVljLJl xy\rr\J JLdmilY 


SAG1641 


774 

A /*T 


x acv/ ituiiiiy protein 


SAG 1642 


777 


AD\/ ucuiopuiLcx, otiL/otratc-Diiiuiiig pjAJtClll 


SAG1643 


770 


oil lfnminf* nmiHf\trj»ticfprflQp» place T 
glUI fill UUC dllllUU LI allolCI cloe, t/labb 1 


SAG1644 


37 


hypothetical protein 


SAG1645 


238 


conserved hypothetical protein TIGR01033 


SAG1646 


32 


hypothetical protein 


SAG1647 


328 


dihydroxyacetone kinase family protein 


SAG1648 


178 


transcriptional regulator, TetR family, putative 


SAG1649 


i 37 


hypothetical protein 


SAG1650 


| 329 


dihydroxyacetone kinase family protein 


SAG1651 


192 


dihydroxyacetone kinase family protein 
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ble 1: C raplete list f GBS pr dieted genes 



vJIvr 


oize 


An no ia n on 


^AG16S9 


X Z*T 


VAJll&vl VCU xiypuui vulval piuiviu 


<SAfr16S3 


937 


cr1vr»prr»1 iintiilfp ■fanilitJitfn" rirotpin 
^ly vClul Upictftv XavAlllcllUl puJivlll 


SAG16S4 


134 


pnn CPT*\/pH hvtWt"hp'Kl % 5l1 TYTrttpi'n 

ouiibd vcu uy puuivUvai piui^m 


D/\vJ10«>-> 


937 


tT^iTiooMTi+irtncil tv3»r»n1 o.4/vt* A^pt*1? "farinilv 

udiisvxxpuuxxax xvguiaiui 9 iv a wxv xaxxiiiy 


Or\vJ X 0->0 


369 


L/Uiiocrvcu xxypuixxcixvax pxuicixi 


^Afrl 6S7 


R3 


xiypuixxcuvax pruicxxx 


RAfr16SR 


944 


uoxxocxvccx xiypuiixciivax pruicxix 


RA016SQ 


1 1 R 
X 1 o 


xijj ap-rcx<iLCvx pruicxxx 


SAO1660 


173 


i cr\plinrl cmatacp fiarrtilxr t^T*r\f"P k iTi 

xbucxxurxaxxxdutoc laxxxxxy piuiciix 


^Afr1661 


10S 


r^r^noor^/^/l Vkxrr*r\ , t1io'fir*«i1 nrrttpin * 1 TtlU 004ftft 
^UXXowI VCU XXypUUXClXvaX pXUtvXXl X JIvJXS.V/U*tOO 


^Afri669 


910 
zxu 


CUIlaCrVCU XiypuuXCUvcU pXUlvUX X XvJXvv/v/*tOZ» 


O/iVJlOOj 


ins 


onncprvpH lixmotVi^tir^sil tvrrktPin TTGP009S3 

conserve u xiypuineiicai pruiexxx xivjrixuuzrjj 


OrWJ I ODt- 


379 


vj x r -uinumg proiein 


O/1lVJ±00*> 


177 
1 / / 


xxyuroxaSc, naioacia aenaxogexxase**xxKc iainiiy 


OrWJlOOO 


304 


inciii uraiic proiein, puiaQve 


oAUlOD/ 


*foU 


giuiainyx-u^N/\ x oiixj anxiao uansierast?, r> suDuxiii 


Q Am ££Q 


*froo 


giuxainyx-iJviN/\^LJiii ) axiiiuoxransicrasc, j\ suDunii 


Q Am «o 


i no 


giuiaixiyi-ixviN/\ v vjmj auiiaOLTansrcrasc, suDunii 




ool 


pjrnivaie pnospnaie QiKinase i 


C Af*»1^71 

o/\uri o / 1 


97A 


pro icjii ox uxiKno wn xixnciion 


O/VvJlO/Z 


170 


L/Do uotiiain proicxix 






j-nyaroxyacyi-v>o/\ ucxxyarogcxiasc iamixy proiein 


O/\vJ10/*r 


1 R9 
1 oZ 


xsocnonsniaiaSc Xamixy proiein 


G Af^l 67 S 


961 
ZOI 


inxnscripuonai reguiaior lou i , puiaixvc 


\jtW3 1D/U 


403 


ax Nino u an s 1 crabe 9 vxabs x \ 


Q A 01/^77 


i so 


conserveu nypouicucai proLcxn 


<2Afr167R 


460 


xiyuroiabc, naxoacxu uexxaiogeiiaoc-nive xaxxxixy 


QAfrl 670 


390 


asparaginase xaxxuxy pruicxn 


<5Afr16RO 


909 


oxxxiuxxiaxc -j-ucxxyurogciiaoc 


**Afr16R1 


304 


UAXUOXwUUUlaSCy aXUIJ/JvClO IvUUtlaov xaxxxxxy 


<\AG16R9 


671 


r\ X JT "UvpwXXUCXXl LsLSrx. XXCXXwcloC IVCtU 


*5Arrl6R3 


S19 


immiinrtOA'ti'if* Cf^r^f^tf^H TM*/ , k't , ^in r\tif5i'fix/p> 
XllXXXXUXXU^CXXXW owl/lClCU pXOLClll, pULUUVC 


SA(t16R4 


366 


ctlallXXlC lavClllddv 


SAG16R5 


119 

117 


Vi nl r%—fn r* vl -rarrl p»r-TiTnti*i ti^ QVtitVi 51 q p 

llUlU""^<XvyX ^aXllCX piULCXXX y i> V 11 LilOo c 


SAG16R6 


33S 


piXU2>pXXU Z. UCl ly UX U"J"UCUAjr Xicp LUJlCtiw alUU 1 doC 


SAG16R7 


R49 


picpxuxcxxx UaiXoXi/^CtoCy octn oUUUXXli 


SAG16RR 


31 S 


XXlrtl 1 1 IUoC*v*|JlHJ jpildlC loUlllCl aoC 9 W>lOoo X 


SAG1 6R9 


993 


■fin i f*trs IVi n si cp 


SAG 1690 


639 


no oy oiwiiig liiiDv/ ^uixxpuiidiio 


SAG1691 


479 


Qiif*rr>cp-6-n1iAQriliatp VivHrnla^p 
dUviuow vi (/iiudpiiaib jut y ui uiiw v 


SAG1692 


320 


sucrose operon repressor ScrR 


SAG1693 


144 


N utilization substance protein B 


SAG1694 


129 


conserved hypothetical protein 


SAG1695 


186 


translation elongation factor P 


SAG1696 


38 


hypothetical protein 


SAG1697 


48 


hypothetical protein 


SAG1698 


99 


conserved hypothetical protein 


SAG1699 


30 


hypothetical protein 
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1e 1: Complete list of GBS predicted gene! 



genes 



"7 ..CS^&O-S 



v/Jvr 


Size 
fa a ^ 


A nilAtfliinn 


^AO1700 

1 f\JVJ 


76 


hvnntVipfipsil nr/vtein 

*Jjr JJvl UlW LI WCLL piUlwlll 


SAG1 701 


56 


hvnnthptipal nrotein 

*ijr |JVJ lllwllWCll pivtwiii 


SAO1709 


41 


hvnnthpfipnl nrntein 

pWlUvllwiU piuiwiii 


SAG1703 


54 


bvnr*tViPtir»«1 rvrntein 

UjPuUlCUl'dL JJ1 UlWlli 


<JAG1704 


150 


pvriHitip/H^rvv'vp'V'riHvlatp HpjiiYiinJKJP "fiarttilv Tvrntpin 

Wjr 11 U J IlC/ UCUAj V Jf Utljr lOlv \* WCUUlllCloW lCUlllljr pi U I Will 


SAO1705 


NA 

1^1 -fa. 


TV**nH*1i*cA !V/f94 "ffninilv antHpntii* -nriint mnt^ti An 

pwpilUaow, IV-Wt.^ Idllllljr, ClllLllWllLlW p Willi 11114 LH11U11 


S AG1 706 


i-JO 


PAtiQprvprl livnAthptipal nrAtpin 

vUilOCi VCU Ujr pwtiiwiiwcu pi W* twill 


^AO1707 


499 


UlUg lColoulLlviC U CUia|/VJl ICl y IZtllllXJ/ V^ttv/A KUlllljf 


^AO1708 
OjtvVJ X /I/O 


JO 


li jr puuicuLal piLiicm 


<5tAO1700 


949 


CAvlllUwiCooC ADU, AV oUUUllll 


^AG1710 


993 


ivUiioci vcu iiyjpuuiciivdi i^ioiciii ^ 


^AG171 1 


314 


luagiicsiuin irdJlbpUl LCI, vUIA. lalllliy 


QAG1719 


70 

/7 


riKncnmol nrnfpin Q 1 Q 

riQUoumdi proicin oio 


Or\vJ 1 / 1 J 


1 6^ 


singic~SiTaiiU uinuiiig, pruicin 


OAOI 71 4 


7-> 


riuoooniai proieiii ou 


Q Af^171 ^ 


^74 
J /H- 


t\i vj-speciiic aacmiiw gxycosyiasc 


o/tlVJ 1 / 1 0 


107 


LXcinscripxionai rwguiaior 9 v^ro/Vri xainiiy 


QAH-1717 


1H4 


unorcuoxin 


QAm 71 Q 
o/\VJ 1 / 1 o 


1 

100 


r/irz iamny pro xein 


O/tAJ 1 / i7 


770 

/ /7 


^^ii+^O ^switlxr ni*/^'i'^iri 

lviuioz. ieuniiy proxciii 


0/\vJ 1 / Z.K3 


1 SO 

1 OU 


conserveu nypoinciTcai protein 


^AG1 771 
uAVJl /^l 


10^ 


wonbcrveu nypoLQcuwdi proieixi 


^ Am 779 


907 


llUUIiUwiCdoC lllll 


^A0179^ 


107 

17/ 


Slgllctl pcpUUdbC 1 


^AG1794 




llCllwaoC, pUUlLlVC 


^AG179S 


i^n 


o^wic^fir^yl Vi'xmnt'Vif^ti r»nl nrntpin 

vuiibcrvcu ny puuicuwdi pruicin 


^AO!796 


364 


i^iN/\-uaiu<tgc a *uiuuuiuic pro IC ill JT 


QAG1797 


770 

# /IA 


1171 ilia iw civciy l u. cuioici doc 


<5AG179R 


194 


jt iviiN - pmmug piAJiviii 


SAG 1729 


309 


CUIJoCi Vvu uyp\JUlwtlwal pi VJLClil 


SAG1730 


! 951 


k/Uiioci vcu iiypuuiwiivcti pi uiwin 


SAG1731 


298 


111 Will UL dH W JLflW^lWUl, pUtClVlVW 


SAG1732 


1 282 


orl vpRrol untfilcp "ffiPilitfiiTvr niY^fpin riiitfltive 

giywwivii uiy ixuvw iciwiiiKHw*! piviwiu, lyuLdLivw 


SAG1733 


150 


iinivpr^nl <5tre^ nrntpin "fjunilv 

Ulli V Wl UCU OUMj \Jk \J I Will lCUlllljr 


SAG 1734 


400 


trflHQT>r>rtpr mitativp 

u cu lopui ) puiauvv 


SAG 1735 


219 

^* 1 7 


trflri^crintional repulatnr Prn/Fnr fiamilv 

wl CUloWl 1|^ Uv lull l^gUluvvl) V^ip/X 111 Aclllllljr 


SAG1736 


761 


^C-nro dinenlidvl-iiPTitida^p 

^ v iJi vj v»jl^-/v* iy iivxjr jl i^wivtiuctdw 


SAG 1737 


119 


Hvnothetical orotein 


SAG1738 


326 


ijolvnrenvl svntheta<5p fatnilv nrotein 

iy v/ijr |-ri wiijr a jjiiuiviaj^ icuiiiijr piuivui 


SAG 173 9 


582 


ARO transnortPT ATP-hindinc nrotein CvdC 


SAG1740 


572 


ABC transporter, ATP-binding protein CydD 


SAG1741 


339 


cytochrome d ubiquinol oxidase, subunit II 


SAG1742 


475 


cytochrome d oxidase, subunit I 


SAG1743 


402 


pyridine nucleotide-disulphide oxidoreductase family protein 


SAG1744 


299 


prenyltransferase, UbiA family 


SAG1745 


148 


hypothetical protein 


SAG1746 


35 


hypothetical protein , 


SAG1747 


99 


conserved hypothetical protein TIGR00103 
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le 1 : Complete list f GBS predicted gen 



een^T 



\JJbsJp 




Ann fail n 


OnUl /Ho 


196 


rvH onrnnane- fettv-acvl-DhosoKolinid svnthase 


OnVJl /*ty 


241 


transerintionai regulator. rnerR farnilv 


SAG1750 

OAvJ 1 / Jv 


195 


exonucl ease 


O/WJl / Jl 


178 

I/O 


conserved hvnothetical orotein 


v*A01759 

OnUl / J^ 


190 


conserved hvDothetical orotein TTGR00275 


^Arri75i 




rwn served VivnotHetical nrotein 


OnVJl / J*r 


89 

OJ7 


rirmsomal nrofpin SI 4 


O/iVJl / JJ 


18 

JO 


livnntViptical nrotein 

liy J^v/IA Aw lAwCIA pivlviu 


<5AG1756 


141 


conserved hvnothetical nrotein 


A01757 

OrWJ 1 / J / 


116 


0-<5i»1oolvronrotein endoneotidase familv nrotein 


CAH1 7 Sift 
OrvvJl / JO 


115 
1 JJ 


t*iVu~\cnmsa1 -r^rritRiTi— Jil anirif* jipptvltTansferase ntrtative 


<5Afrl7^0 
d/\vjji i jy 


910 


nrntpin rk"F iinlrnnwn "fiinrtifvn 

|JJL VJ Lwlll Vrl. UlUwllU Wtl lUliV/ 11V/11 


QAfii760 


76 

/O 


rnncprv/pH hvrtfYfKiPtical nrofpin 

VUllaCiVCU llj JJU LllWllW.<ll pXV/Lwlll 


c Ar:i7Ai 

O/WJl /Ol 


^50 


UICI^IU UCIA lawluUludv oUjp wlXCUlllljr lylV/LWlJUL 


DAVJl fOA 


1 AO 




O/WJl /Oj 




giUuirniiiv ayiiiiicuioc, iypw x 


o/\vjr 1 /OH- 




tTanswiipiionai rcguiaior vjiiixv 


OAVJTI /Oj 


17Q 


conscrvcu. nypuuiciiu<ii pruLcm 


oavji /OO 


10ft 

J70 


pnospnogiycerd.ie Kinase 


OAVJi /O/ 


9ft0 


dClU pnOSpnauloC 


CAfll 16SL 


JjO 


giy wCFSxiucflyuc j— piwjopiiciic ucu y ui uguiaow ; 


o/Vvji ioy 


AQ9 
oyz 


ITanSlallOIl ClUIlgclllUIl lavlUl VJ 


QAni 770 


1 S6 
1 JO 


nDusomai proicin o / 


o/\vjr i / / 1 


1 17 
lj / 


riVtricnmal nrntpin Q 1 9 
riuUbUlllai pivHvsill Ol^r 


Q Afi 1 779 


970 


pur upcron rcprcoour 


<*AG1771 


111 
ji j 


U P| Hnmain ni*rttf*in 

XXLy UUllKUil pi v# twill 


^A01774 

O/AAjl / /*t 


494 


rnncprvpH livnotVipfif 51 1 TiTOtein 

wVJllOWJl V CU 11. J [/U Lll WllVCU tJlV/lWlAA j 


<5Afr1775 

OAVJ 1 / / J 


910 
^lu 


rnncprvf>H HvnfitViPtipal nrotein 
v>vjiiowi v cu,iiy jj\juittiwc»i piuLviii 


c A01776 

OAVJ 1 / /O 


990 


Vii 1 1 r\ cf»— t»V» ncnVi fl+f* 1— pnimPTfise 
1 1 U LUUov pi lUo LJiia lw j~wiyuuuiwi.aow 


^A01777 

O/vVJf 1 / / / 


900 


r>nTic#»rv*»rl VivnotVipfical nrotein TlGROOl 57 


SAG1778 

OA.VJA I/O 


981 

aOJ 


rR"MA ^oTjanine-iNJl-^-methvltransferase nutative 


RAG177Q 


900 


HiinptVivlfiHpiioQinp transferase 

Villi Aw 11 A_Y ACAVlwAAwDAAAv UCUWlwlCWV 


SAG17R0 


161 

X VJ J 


VivnntViptipal nrotein 

llVpUlllCUvCu lyAV/lwllA 


SAG1781 


186 


nrimase-related nrotein 

AyAAAAACAOw A V/lUlvVi r fcVO-l. A. j 


SAG 1782 

U/TkAJ 1 / Oil. 


760 


deowriHonuclease TatD femilv 


SAG 1781 


90 


Vivnotnetical nrotein 


SAG1784 


110 

1 Jv 


VivnntViptieal nrotein 


SAG1785 

OxVVJ 1 / OJ 


410 

tJv 


Tivnnthptipal nrotein 

Aljr pVJ LlAw UvCU AyAV/twAll 


SAG1786 


110 

A J V 


nrofpin of* unknown function 

JJlVJldll VII UllAJlVWll llUlvilVll 


SAG1787 


420 


dlfD nrotein 


SAG1788 


79 


D-alanyl carrier protein 


SAG1789 


421 


dltB protein 


SAG1790 


511 


D-alariine-activating enzyme 


SAG1791 


395 


sensor histidine kinase 


SAG1792 


224 


DNA-binding response regulator 


SAG1793 


44 


ribosomal protein L34 


SAG1794 


451 


membrane protein, putative 


SAG1795 


388 


transposase, IS30 family, putative 
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able 1: C mplete list f GBS predicted genes 



ORF 


size 

/a 0 \ 


Annotation i 


c a m OOA 

oALtI /yo 


J /D 


amino acia AxJi^ transporter, permease protein 


q a rii OOO 
oAVJl /y / 


A(Y7 


amino aciu AJ3vx transporter, /yi jr-oinaing protein 


g a/ii too 
oavji /yo 


70 

J7 


hypothetical protein 


G A fit TOO 

oAaji /yy 


TOO 

/yz 


xyiuioseo-pnospnate/iructose-o-pnospnate pnospnoKetoiase 


G A /II qaa 
oACrloOO 


KO 


conserved hypothetical protein 


CApi OA1 

oAvjloUl 


^0 
DDy 


transcriptional antiterminator, tsgivj iamiiy 


g a m ono 


0^1 
2D3 


conserved hypothetical protein 


oAIj1oU3 


DUD 


caroony urate Kinase, rvjOY iatmiy 


G Aril Qf\A 
oALj1oU4 


QOO 

3zy 


hypothetical protein 


c a m one 
dAvjIoUD 


4o3 


PTS system, IIC component, putative 


g a m OAA ! 


31o 


glyoxylate reductase, NADH-dependent 


G A ni OA'? 

0AOI0U/ 


no 

jjy 


hypothetical protein 


g a m OAO 


OOT 

32/ 


sugar binding transcriptional regulator, LacI family 


G A r* 1 OAO 


215 


transaldolase family protein 


G A r*1 01 A 


23 0 


carbohydrate isomerase, AraD/FucA family 




00*7 
2o7 


hexulose-6-phosphate isomerase, putative 


G a r»i 010 


221 


hexulose-6-phosphate synthase, putative 


OAP1011 


161 


PTS system, IIA component 


SACrloT4 


AO 

92 


PTS system, IIB component 


SAG1815 


/IT A 

479 


transport protem SgaT, putative 1 


OAPioii: 
oACjIoIo 


205 


hypothetical protein 


oAUlol / 


157 


hypothetical protein 


oAvjrlolo 


430 


adenylosuccinate synthetase 


G A /II 01Q 


1/1 A 

340 


perfringolysin O regulator protein 


G A /II QOA 

oAvjloZU 


224 


conserved hypothetical protein 


C A /II ©O 1 


OCA 

/50 


glutamate— cysteine ligase/amino acid ligase, putative 


c a m coo 


000 
2/2 


protein of unknown function 


g Am qoi 


41o 


protein of unknown function 


G A m QOA 
oAvjl 5Z4 


001 
291 


chaperonin, 33 kDa 


G Am ftO< 
OAVJloZD 


10< 
JZD 


NifR3/Smml family protein 


G Aril SO A 
O AVJ 1 oZO 


011 
Zi J 


ueoxynucieosiue Kinase iamiiy protem 


oAul oZ / 


IOj 


pnospmnotnncin JN-acetyitransierase 


G A/11 20Q 
oAu 1 0Z0 


51D 


ATP-dependent Clp protease, ATP-binding subvuut 


QAni 000 


1 ^A 


transcriptional regulator i^tSK 


g Arii 

oAvjlOJU 


1DJ 


conserveo nypotneticai protein 


C Aril STW 




translation elongation iactor i s 


O/WJl OjZ 


ZjO 


riuosomai protein oz i 


OAni Ml 

o/wjti ojj 


1 

JLoO 


aiKyi ny arop eroxiae reouctase, suoumt 


Q A/*1 MA 


^1 A 
D 1U 


aiKyx nyoroperoxiQe reauctase, suoumt r 


g Arii qoc 

o/\vjr 1 odd 


1 ^A 


conserveo nypotneticai protein 


SAG1836 


61 


aatiqptvivI VivnAtliPtical ArAfpin 


SAG1837 


468 


prophage LambdaSa2, lysin, putative 


SAG1838 


109 


prophage LambdaSa2, holin, putative 


SAG1839 


136 


conserved hypothetical protein 


SAG1840 


112 


hypothetical protein 


SAG1841 


76 


conserved domain protein 


SAG1842 


1224 


prophage LambdaSa2, PblB, putative 


SAG1843 


240 


conserved hypothetical protein 
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able 1: Complete list of GBS predicted genes 



V-rXVT 


Size 
(a. a.) 


An no tii firm 


SAG 1844 


911 


conserved hvnothetical nrotein 


SAG1845 


42 


hypothetical nrotein 


SAG! 846 


158 


hvDOthetical nrotein 

M - ± Jf r LA V/ CIA LsX V tvlil 


SAG 1847 


227 


conserved hvnothetical nrotein 


SAG 1848 

LJ./ l\_P * u • u 


114 


conserved hvnothetical nrotein 


SAG1849 


115 


hvnothetical nrotein 


SAG1850 


101 


hvnothetical nrotein 

llj p vlUVUvUi pX Is Lis 111 


SAG1851 


111 


conserved Hnmain ivrfitein 

VUlwvl V \sK£ UUlllClAll ^Ivlvlll 


SAG1852 


420 


conserved domain protein 




180 


piupiiagc iwcuiiuucitjaz> 9 pruicixoCj pLiiauvc 


S AG1 854 


380 


WUavl VCU UjrJUUUlCliV/CU piULwill 


SAG1855 


570 


piUpi.la.gC x^CUllUUaO<X£r 9 ICxIxlxllaoC lalgC oUUUlllL, pUulllVG 


SAG1856 


161 
i u i 


iiy puuxciicai piULCiii 


c AG1857 


119 

117 


nrnnniiorA T ciiviT^Hci^loO T-TWT-T <*Ti/ifvnnf"* 1 ^c^cf* family/ nrrvt&i n 

piupuagc jLdiuDUaO(U) JxiNix cjiuuiiuvivadC lcuixiiy pruLcin 


SAG1858 


95 


iiy pu LLLC LilsCXl piuiciix 


<*AG185Q 

OAVJ10J7 


i OU 


prupndge i^mnuuaoaz, siie-spcciiio recuiiioinasc, piiagc lnxcgrase 

fonnilv 

iCUIIHJf 


SAG I860 

vjiXXJ 1 Ovv 


154 


IsUlxoCl VCU xJjr pULilCllwal piULClll 


SAG1861 


119 


piupnage JLau] DUajoz, ixaiib i> n p uui leu rcguiaiur, i^iu/ v^i lainiiy 


SAG1862 


86 


hypothetical protein 




1^8 

1JO 


propiiiigc i^tuuDU-ciocL^, singiC'Sixaiiu. Diixuixxg pruiein 


SAG1 864. 


uo 


iiypuuicLicdj pruLcixi 


SAG1865 


74 


conserved hypothetical protein 


9AG1866 

OrxVJ 1 OUU 


1 HQ 


conscrvcu jiypoineuCai proxem 


Q AG1867 


16^ 


cuiibcrvcu nypoixiciicax proicm 


SAG1868 


1^4 
i j*t 


iiypuuiciiva.1 proicin 


SAG1 869 




prupnagc JudluDClaDa^, type JUL J_/lN/\ ulOUxXlCdllOIl 

m f*t Vi vl trfln^fprn cp m it n ti vp 
iii^uiy lucuijiciaoc, puuiuvw 


SAG1870 


273 


nrfinViJiO'e T JiinHHaSji 4 ? DNA r**rilif^5itir>Ti nrritf»i n yinnr^ mitsitivp 

pi U|JHOgC X-rcU(JUUClOCL^ 9 J^/XN^\ ICpilWtLlUll p&Utdli XVlldV^ 9 pilldllVC 


SAG1871 


248 


nmnHaoe T flmlvifiSa^ HaoterirvrkKaore Tenlipatirin 

L/XV/piXO^w I <CU 1 1 ^\>C*UCt4tog LfCLOLwXXV^piiaK^ xwpxxwcxtivjii 

nrotein/hvnothetical nrotein. tTOncation/fiision 


SAG1872 


200 


hvnothetical nrotein 


SAG1873 


443 

TTJ 


nronhaue T .amhflaSa'7 renlipative TYNfA heliea^e 


SAG1874 


87 


hvnothetical nrotein 


SAG1875 


94 


conserved hvnothetical nrotein 


SAG1876 


176 


oronhase LamhdaSa2 HNH endonuclease familv nrotein 


SAG 1877 


236 


nronha&e LambdaSa2 antirenressor nrotein nutative 

*J W* ill WUWfcj Ull 111 V Ul VUijUI UlVlVllly L/WH1U T V 


SAG1878 


102 


conserved domain protein 


SAG 1879 


156 


hvnothetical nrotein 

XLJr pvLXXwiilV/CXX L«ivivlil 


SAG 1880 


54 


hvnothetical nrotein 


SAG1881 


51 


hypothetical protein 


SAG1882 


120 


prophage LambdaSa2 3 repressor protein, putative 


SAG1883 


128 


conserved hypothetical protein 


SAG1884 


134 


hypothetical protein 


SAG1885 


356 


prophage LambdaSa2, site-specific recombinase, phage integrase 
family 


SAG1886 


32 


hypothetical protein 


SAG1887 


689 


Na+/H+ exchanger family protein 
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able 1: C raplete list of GBS predicted genes 



OKU 


(sua. j 


Ann ota ti n 


oAulooo 


f 0 


nypotneticai protein 




^17 


micro cin luuuuniiy protein iviccr , puiaii ve 


QAni CQA 


£^1 
O j X 


enaopepuuase kj 


OAVJ 1 07 1 


D£ / 


oxiuoreaucxase, oio/iuri/ivioc/\ iamuy 


c Am ftoo 

oAtJlo^Z 




mem Diane protein, putative 


dAvxIoso 




nypotneticai proiein 


Q A 8QA 


914 
Zl*r 


cyclic nucieouue-Dincung aomain protein 


c Am co< 
oALrloiO 


9AA 


poiypeptiae aeiormyiase 


Q Am BOA 




sugar Dinaing iranscripiionai reguiaxor ivcgiv 


QAA1 00*7 


£1A 


conservea nypoTneiicai proiem 


C A m GOO 


971 

Z / X 


rio system, jllli component 


C Am QQQ 

oAGl ©99 


900 
Zoo 


PTS system, IIC component 


SAG1900 


164 


PTS system, DB component 


SAG1901 


398 


glucuronyl hydrolase 


SAO 1902 


% A A 

144 


PTS system, LLA component 


SAG 1903 


1 A 

34 


hypothetical protem 


SAG 1904 


OTA 

270 


oxidoreductase, short-chain dehydrogenase/reductase family 


SAG1905 


212 


conserved hypothetical protein 


SAG1906 


335 


carbohydrate kinase, PfkB family 


SAG1907 


212 


2-dehydro-3-deoxyphosphogluconate aldolase/4-hydroxy-2- 
oxoglutarate aldolase 


C A CVK OHO. 

oAG190o 


AQO 

499 


hypothetical protein 


O A 1 OAO 

o AG 1909 


OA/I 

204 


nitroreductase family protein 


oAGi910 


1 ai 
141 


transcriptional regulator, MarR family 


oAAjriyi 1 


140o 


DNA polymerase III, alpha subunit, Gram-positive type 


Q a m oi 9 
oALfi^iz 


1 0A 


JN-acetyimuramoyi-ij-aiaTune arnioase, iamny <t protein 


o a m qiq 


/CI 7 
01 / 


proiy 1-tKJN a syntnetase i 




AI O 


memDrane-associatea zinc metaiioprotease, putative 


q Am 01 ^ 


9AA 


pnospnatiaate cyuayiyitransierase 


QAm Q1 A 


7^n 


unaecaprenyi cupnospnate syninase 






MfaMVA^ai n fMnciiAAAC>A v 1*11^ oiiraimf' 

preprotein trans tocase, i aji^ suounit 


QAm 01 ft 


1 1 A 


oacteriocin transport accessory protein, putauve 


c Am 010 


JO/ 


maiate oxiuoreauctase 


jau i vzu 


AA^ 


citrate carrier protem, i^v_,o iamiiy 


D/\vJ 1 7Z 1 


JUO 


sensor msuuine jtunose 


O/Tk-VJ 1 7ZZ 


990 
zzy 


response regulator 


^1x1093 


^31 


\JUI7 -glULrUaC *T— CpiillCrcloC 


Or\VJ 1 7a*t 




I upon 1 n QlnnQ.AiiiAACiHoco 

giucan 1 ,o-dipn<i-giucosiciase 


Q Aril 095 
O/VVJ 1 iJZ J 


177 


sugar adl transporter, a i x -Dinoing protein 


QAm 09A 
o/\.vJ 1 7ZO 




neiix-turn-neiix aomain protein, ns-type 


SAG 1927 


298 


lac^C nrntein 


SAG1928 


325 


tagatose 1,6-diphosphate aldolase 


SAG1929 


310 


tagatose-6-phosphate kinase 


SAG1930 


171 


galactose-6-phosphate isomerase, LacB subunit 


SAG1931 


! 141 


galactose-6-phosphate isomerase, LacA subunit 


SAG 1932 


816 


neuraminidase-related protein 


SAG1933 


482 


PTS system, IIC component, putative 


SAG1934 


101 


PTS system, IIB component, putative 
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Size 
(a.a.) 


A n n tit a fi rin 


SAG1935 


157 


PTS svstem. II A comnonent. Dutative 


SAG 1936 


258 


lactose ohosohotTansferase svstem reoressor 


SAG1937 


NA 


streDtococcal histidine triad familv orotein degenerate 


SAG1938 


307 


adhesion lioonrotein 

%JLXJJL&A^/wJLVJ r JLJi lAwVUi V JL A JL 


SAG1939 


147 


orotein of unknown function TIGR00256 


SAG1940 


738 


GTP ovroDhosnhokinase familv orotein 


SAG1941 


800 


2* 3*-cvcHc-nucleotide 2*-ohosohodiesterase 

V f VliV UUV1 V V UviV A* UliVtll/iiVUlvulVlUuW 


SAG1942 


151 


nrdl orotein 

AAA VAX L^A \J VV A A A 


SAG1943 


345 


conserved hvnothetical orotein 


SAG1944 1 


165 


conserved hvnothetical orotein 


SAG1945 


345 


iron ABC! transoorter iron-bindine orotein 

AA VJAA rJULIVV U lUlOL/vi Ivi j ilV/11 ASAAXVAAAXJe^ UlV/lViil 


SAG1946 


257 


TyNA-hindin<* resnnn^e recnilator 


SAG1947 


549 


conserved hvnothetical orotein 


SAG1948 


275 


PTS cv^itpm 1111 comoonpnt 

X AO djrOLtrlAA, XXXV VVAlAAlJUllvvlAl 


SAG 1949 


269 


PTS svsteiTi TTC comoonent 

X X fcJ Olwlli^ 1AV vVl AAJ^/l/1 Al*AAL 


SAG1950 




1 lu a jr aid 11, iilJ xAaIIIJ^vSaIwIII 


SAG1951 


141 

X^ X 


PTS svstpm TTA rntnnAnpnt m i 1*1 

A 1 kJ oJ'oIVvaIA, iirx VWaaa^/V/aIVaaI} pUlalAVw 


SAG1952 


353 


tTiPiYihrsinp nrofpin TYntntivp 

AAAvyAAllSACUlV IJAVl^/lAA, IJ LALCHA V 


SAG 1953 


60 


hvnothpfipal TirAtpin 


SAG1954 


384 


mpmhrnnp nrntpin mitativp 

XIAWXAAUA.CIXAW piUlWAAAj l^/LALCALA V \s 


SAG1955 


282 


ARr 1 tninQnftTtpr A '1 'P-KinHino nrotpin 

iVUVv U CUldpiVl tVi 9 All iJAlAv_lAlA^ JJaVIVAII 


SAG1956 


96 

J* VA 


PonQPrvpH hvnnthptir^l tirotpin "trniTipJition 

wilOVil VVsVa aaJt IJ^LAA^tlV^AA lyA VA L&AAAj U. UllwcLLlUll 


SAG1957 


250 


fPQonn^p rp<nilfltor 


SAG1958 


276 


conserved hvnothetical orotein 

vV/lJ>9wl VwU XAJT pUUlvUvOI piVllvlAl 


SAG1959 


727 


PTS svstem TIARC comonnents 

X X U Ojr OlWIAA ? AAT1AJV WVAAAJ. L/ vJAAWAA to 


SAG1960 


551 


sensor histidine Icinasp 

OvlAOUl LUJUUAIAW AVAAACAOw 


SAG1961 


225 


ohosnhate re onion resnonse regulator PhoB 


SAG1962 


218 


ohosnhate tran snort svstem repulatorv orotein Phol T nutati ve 


SAG1963 

*.x»»j x -X 


253 


ohosnhate ABC transnorter ATP-hindint? orotein 

L/XA\/0L7XaC4vV AALAV vA <ll IJtAV/l tvi y AX X X tyHI\IIIA^ J^/A V/ vw AAA 


SAG1964 


292 


ohosohate ABC transoorter oerrnease orotein 

L/AAV/Ola/AAvilV iUAv AX WIAJyVA WA ) IVwAAAlvCltSv J/lUlvlll 


SAG1965 


281 


ohosohate ABC transoorter oerrnease nrotein 


SAG1966 


293 


hemolvsin orecursor outative 

AAWAAAV/A T vAM UlvvlU I/UIUU V V 


SAG1967 


i 195 


hvoothetical orotein 

AA T U1V AA V AAA k/ A w iV AAA 


SAG1968 


• 246 


conserved hvoothetical orotein TIGR00046 


SAG1969 


317 


ribosomal orotein L 1 1 methvl transferase 

* » WW VlAMtl A A AAAWVAAT AllMAWAVAMOW 


SAG1970 


102 


conserved hvoothetical orotein 


SAG1971 


41 


hvoothetical orotein 


SAG1972 


238 


transcriotional reenlator. MerR familv 


SAG1973 


156 


acetvltransferase GNAT familv 

*4>^r%^%i T JL VJ» %JLJlJi^X%/JL M^Wn ^ J JL A T 


SAG1974 


152 


MutT/nudix familv orotein 


SAG1975 


47 


hypothetical protein 


SAG1976 


156 


conserved hypothetical protein 


SAG1977 


163 


acetyltransferase, GNAT family 


SAG1978 


422 


ATPase, AAA family 


SAG1979 


253 


membrane protein, putative i 


SAG1980 


300 


ABC transporter, ATP-binding protein 


SAG1981 


68 


hypothetical protein 


SAG1982 


359 


transcriptional regulator, Cro/CI family 
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vJxvr 


oize 


AUUUlallUIl 


<SAG1 QR3 


105 

i UJ 


pnncprvpH Vixmntliptipjil nrntPin 
wviiiscr vcu xxypuiuciiwai pi \j vvkih 


SAG1 QR4 


188 

JL OO 


rrtncprvpH VivnntHpfipfil nrntein TTGT^00730 


SAG1 QR5 




lljr UlHIlCLXl/al pXUtGlXl 


SAG1QR6 

0/*AJ 1 70U 


375 


Qifp-cnpfM-fip rprwmhina^p nhfuyp intfttfrase faiYiilv 

OllC*opCi>lll^ ICL'VALllUillCiOV, JLJ110££& lllLVglOO^ iullllljr 


^AG1QR7 


61 


prvncpfvprl Vivr^fvfhptipsil ivriYtPtu 
tsuxxoci vcu ixy puuxcin^ai pitJicin 


RAG1QRR 


349 


PrtncprvpH Vivr^fYfViPtipal Tvrrvtpiri 
vuxxocxvcu iiypuuiciiL^ai Lyxuidii 


^lAGIQRQ 

O-rVVJl 707 


13Q 


iXYpOUlCIJvaX piULClXX , 


SAG1 Q00 

o/"Vvji yy\j 


197 
i^ / 


XXypUUXCtXVaX pXUlCXXl 


<5AG1QQ1 


204 


tr i cjficr*t , 5Tvti r\r\ci1 tv» crt l 1 atrw (~*vr\l f^T "Pa m 1 1 v 
Ualiol'XXpilVJIlaX XCgUXCHUX, vlU/vl XaiXXlijr 


G Afll QQ9 


SI R 


pxOlClli Ol UliKIlO Wn iUIlwllOIl 


SIAG1QQ3 


373 
did 


oiic-opcuixit* rccoiTi DiiictoCy pudge micgiaoc xcu.xii.xjr 


OAVJ 1 yyn 


10R 
1UO 


conscrvcu nypuinciiccu proicin 




910 


!-»■« /r^/-\TV»*»i"t Tit*r\ tAin 

nypouieiiL'Cii proicin 






cell wail suriacc ancnor xainiiy proieui, puuiiive 




1 R9 


nypoinciiccu pro xein 


QAni QQQ 

o/wji yy& 


4^7 


nypouiciicai protein 


Q A O-l QOO 

&jt\\jiyyy 


/17 
**/ 


nypoinciicai protein 


q a rv>oon 


OOO 


mem Diane proiein, puiative 


Q AfV>00 1 


/DO 


conjugal iTansier proicin, mxeiTupiion-^ 


c Afv>oo9 


19Q 

±z.y 


loijoi, xransposase wno 




197 


lo 1 Jul, LTanspOSaSc \JTU\ 




67 

O / 


conjugal irdnsier protein, inxeirupiion M iN 






conservea nypoinexicai protein 


*1AG9006 


RR 

OO 


conscrveu. nypoixietivai protein 




317 


conbervea nypotxietivai protein 


^AG900R 


R4 

0*r 


coiioerveu. nypotnetiuai protein 


^AG900Q 


RR 

OO 


conserveu xiypo tncii uai protein 


SAG201 0 


157 


lljr puuiCLiL>ax pxv/texii 


SAG201 1 


160 


i/Uiioci vcu xxypuuxcixvax pxutexu 


SAG2012 


QO 


VlVf^fYfTlPi'lf 5*1 'ntTfc+PITI 
XI Jr |IUUlCUviU UlUlClll 


SAG2013 


IRQ 

107 


rYVT\fitnp'tir*sil nmtf*iTi 

1 1 jr p\J U J, v/ livui pivLCXXX 


SAG2014 


449 


V»VT^O'fllP > f'ir*5l1 "niTkl'RITl 

xiypuuiwirXvcu pxuiexxi 


c AG2015 


yy 


uaxxawxxpixoxxax xeguiatux 9 v^iu/^i xaxtiixy 


SAG2016 


125 


livnotliptipjil "nrnt^in 

XijT JAJUlCUvoi piVJlV/111 


SAG2017 


429 


iTflnQP.rintinrifil r**<rn1fitnr C^TcifC*! fiinnilv 
u.cuioviiLJiiv/iiai iwgiuciiui) v>i vj/ \~/a. jLcuiiiijr 


SAG2018 


553 


1 loxx/ k>p\jxxxxj xcuiixxy pxutcxxi 


SAG2019 


153 


VivnntHptinfll nrntftin 

iijf LAflllVUWU pjULvill 


SAG2020 


98 


Vivnritliptinal nrfitpiti 

lljr px/UiwiivCii piv/LVriii 


SAG2021 


826 


ppll wall ^nrfap^ j*nt*1inr ffimilv r>rAtpi*n 
wv^ii wait oiuicivw aiiwiiv/i xcu u iijr f-/i v/iwiii 


SAG2022 


417 


iTfln^nn^fl<5P 1ST .3 "l^imilv 


SAG2023 


546 


mercuric reductase 


SAG2024 


130 


mercuric resistance operon regulatory protein MerR 


SAG2025 


522 


Mn2+/Fe2+ transporter, NRAMP family 


SAG2026 


1 240 


membrane protein, putative 


SAG2027 


205 


ABC transporter, ATP-binding protein 


SAG2028 


36 


conserved hypothetical protein 


SAG2029 


284 


streptomycin resistance protein 


SAG2030 


130 


hypothetical protein 
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oize 


.MllllUlctll u 




202 


lTVT^r\tVif»tir»a1 Tvrotf*in 
11Y puuiCUtal pujlClll 


^AG90i9 


111 

ill 


rnticprvpH livnAtliPtiofll fvrAtein 

WllaCI VCU 11 y pvl UlCsUVal Jpl V/twiAl 


^AG90^ 


162 


ap^tvltrsiTicfi : » , r«cp» G"WAT" familv 
<*v*wiy iiicUlolcidoC, vjiNr^ x xcuxiiijr 


QAG90^zt 


947 


lllClXlUlclllC priHciii, puidii vc 


Q AG9015 


100 


ATM" 1 trsmcnnrfrfM* ATP-HinHincr i^rotein 

/iD\^ iianspuricr, /a. i i -umuiitg pivicni ( 


^ag90^6 


6R 


liy puuiClll/al piUlClll 


^AG9017 


15R 


trnncrrintimial tv»*m1atAT f"VA/GT fiimilv 

11 £Ul2>lvl ip U IXllal ICgUlall/l, vl L« V>X XCHIIIljr 


<5AG9fttR 


904 
Zvt 


x /tlx z xaiXLiiy protein 


^AG903Q 


OR 


VUlloCl VCU liypUUlGUl«al piVJlClll 


qag904h 


1R6 


/*r\MC<^-r\7i^H V»\rr*/vf'l*f*ti/"»ia1 t^iwh^io ' 1 "1 G-R 007*^0 

vuiiocr vcu iiypuLXiciicai piuiviu iivjivvv/ jv * 


SAG2041 


287 


protease, putative 


Q Ari9HA9 
or\OZU*fZ 


1 Art 

JLUU 


rnoQanese iaxniiy proxein 


SAG2043 


255 


cAMP factor 


oA<jrzU44 


oz 


hypothetical protein 


dACfzU45 


1 *7Q 


DNA topology modulation protein FlaR, putative 


C? A /no A/I 

oAvjzU4o 


1/£1 

3ol 


glycerol dehydrogenase, putative 


oAij2047 


one 
zJ5 


conserved hypothetical protein 


C A /T.O A/I O 


014 


5-methyltetrahydrofolate--homocysteine methyltransferase, 
putative 


5AljrzU4!/ 


/4j 


D-nietnyixeuanyuropieroyiuigiutaiiiaie--noni 
nieinyiiransierase 




1 rt7 
IV/ 


conserveo nypoxneucai proicin 




9^0 


Diancneci~cnain amino aciu uaiispuri proiein /\zi\^, puidii ve 


^AG9059 




Oy pU LIlC LlCrdl pxULClll 


^AG90S^ 


1 ^70 


serine pruiease, suuuiaoc iaixuiy 9 puiaiive 


*!AG905zl 


99R 
zzo 


J-/1N UlUUJLIlg iCopOIloC IC^UlaLUl 


^AG9055 


469 


sensor iiisuuiiic jvinaoc 


<sag9056 


909 
zuz 


UllxUXllWoUlIlC aodClllUiy v lClal,CU piL/lClll 


<5AG9057 


Rll 


1 oi i vl _+T? "M^ A evni r\^+a c p> 
iciiuy i ltvln/tl ^yiiLiicuaoc 


SAG905R 


415 


lllajyi IdL/lllldlV/l lcllllliy pivKCUl 


SAG2059 


9R1 


piuicui ui imnmu wn iiuiisiiuii 


SAG2060 


1QR 


giycvj&yi UAiioiciddCj icuiiiiy o 


SAG2061 


401 


crlvr*r*cv1 tr?iriQ , fi-*r5iQP "fornilv R 
^lycudyi uctiioiciaac, icuiiiiy o 


SAG2062 


179 


tTT5iTicr*Tir*tir\n juitftpnniTifitiAA ATAfpin Mncf} 

UcUloWlipUV'll CUl vl ICl lilXXtCllil Wll pi \J vWlil IN UovJ 


SAG2063 


610 


pauii/gwiiiwiiy pivsLvixi) puiauvc 


SAG2064 


57 


Arf^nrntpin traivslncji^e SecpL QiiHunit nutfltive 


SAG2065 


50 


TiVinQAmal rwAtpin T.H 


SAG2066 


771 
/ / j 


rkpnipilli A-HinHinor AfAtPiA 9 A 


SAG2067 


904 


riKACAirml Iatqp <znl>iifiit A^piiHonriHinp cvntliAQP RluT") ^nHfamilv 

L1UU3U111CU lCUgt OUUUllll |/Ov/U.Uv/UX AVUX1C/ oy tlUlCAOWj 1V1UX/ OU«^lCUJ.tlljr 


SAG2068 


546 


r»nn cf»t\7P/i riVAA^Vl<*ti^Jl1 Al*At**in 
^uiidCivvu iiypuuiviivscii pii/icxii 


SAG2069 


403 


phosphopentomutase 


SAG2070 


223 


deoxyribose-phosphate aldolase 


SAG2071 


400 


Na+ dependent nucleoside transporter 


SAG2072 


259 


uridine phosphorylase 


SAG2073 


245 


transcriptional regulator, GntR family 


SAG2074 


540 


60 kda chaperonin 


SAG2075 


94 


chaperomn, 10 kDa 


SAG2076 


267 


ABC transporter, ATP-bindin^ protein 
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wivJr 




Ann i«xixon 


SAG9077 


298 


APP francnnrfpr nprmpj)CP nrntfio 


SAG907R 


320 


¥Ytt\tp > iri f\f liTilfnmxm ■fiir»ptirvn/linnr>'rni"ei'n niTtfitiv^ 
pi ui will yjx UXXKJllJ wii xuxiciivajj ixpvjpi AAA, puiclll vc 


c AG2079 


265 


twrlrrklfiQP 1 VifilnAi^iH Hpi-inlncrpnaQp-lik'f* fatnilv 

UjrUi UIojCj IlcUiJaWlU UCJJCUVJgCllia&C lilVv XcUlliljr 


SAG90R0 


286 
zou 


OrlvrfWoliiCf* fomilx/ nrnfpin 
gljUAalaaC lcuiiiiy pivHGixi 


SAG90R1 


943 


uuiiocrvcu. xiypuuicLiiycLi piuicm 


SAG90R9 


90S 

ZU J 


aiidcruDiV/ riuonucieoai<ie-iiipiiospn.<itc xcuuciaoc aciivaxing protein 


SAG90R3 


1« 


or*A*t"vr1fTQ"ncfi»i , 'oo^ f**MAT -foxYii 1"xr 
a.L»Ciy lUaildXCXaoC 9 vJlN/A X XalXXliy 


SAG20R4 




viiuidiL'C xawixix iviviivx 9 puuxiivc 


SAG20R5 


47 


uuxiocxvcu itypuinciivvcii proiciii 


RAG90R6 


793 

/ZO 


aiiacrouiw nDonucicosiue-ixipnoapjia.ic retluClaov 


SAG90R7 


405 


iUviiiijraiic proLein, puunivc 


RAG90RR 


40 


nypoinencai proiein 


<5AG90RO 


105 


conscrvcu nypouicuCai proxcin 


RAG9000 


136 

1 JO 


conscrveu nypoineiicai proiein 1 i\jixv/uz3u 


<^AG9001 


RR 


conscrvcQ nypo ineucai proiein [ 


RAG9009 
o/wjrzu :/z 


139 


coixservca nypoxneucai protein 






rccA protein 


SIAG9004 

0/AvJZA#!7*r 


"MA 


couipeience/ aaniage-inauci Die proxein v^in/\ 9 auxncnuc iTamesiLixi 


QAO9A0^ 




DNA-3-iiiethyladeiiine glycosylase I 


Q A 09HQA 




Moiiiaay juncuon ujn/v neiicase rcuv/v 


S1AG9007 


41 R 


uan sporicr, puiaiivc 


RAG900R 


6S0 


i^iN/v naisniaLcn repair proicm nexD 


RAG90Q0 


3^ 


nypoineiicai protein 


RAG9100 


67 

O/ 


coia snocK proiein, v^ou ianiiiy 


RAG9101 


R5R 


ui\/\. nnsmaicn repair proxem riex/\ 


SAG9109 


145 


ctrginine repressor /\rgiv, putaiive 


SAG9103 


S63 


arginy i-ixviN f\ synineuise 


SAG2104 


109 
1 v/z 


WV/lloCXVCil iiypuuiciiucLi piuiciii 


SAG2105 


200 


LrUiiacrvvu iiypuuicii^cu. pruLcni 


SAG2106 


314 


LAjjiacr vcu uypouieucai proxem. 


SAG2107 


5R3 




SAG2108 


426 


111 Q+i H v 1 -tT? f\T A c vntVi ptacp 


SAG2109 


60 


TiVincnTnal TM*rvt<»iTi T 39 
XlUUov/lllCll piUlClll LijZ 


SAG2110 


49 


TilinQfiTYifll nrrfctf»in T 33 


SAG2111 


173 

A / _J 


VyVJllOV^l VvU AJIjf pUUlv UVrCU LJ1 \J LW«XI1 


SAG2112 

U/\\JA> AAA* 


494 




SAG2113 


82 


/♦fitlQPTVpH VivnntVipfiftfil r>rrvtf*iti 

wuowl VbU HJr (/uUlCUwfU |JX VJ LWlXX 


SAG2114 


342 


P-fYnQPTVftH VivT\r\tHpripr*l nrntpin 
vUiiDCi vcu iky pv/uicuv«ai pXULClll 


SAG2115 


143 


lljf p^fUlwllV/ClX pi \J LCI 11 


SAG2116 


151 

A«J X 


Luiioci vcu xiypuLxicixcai pxuicin. 


SAG2117 


71 


hypothetical protein 


SAG2118 


306 


transcriptional regulator, Cro/CI family 


SAG2119 


373 


conserved domain protein 


SAG2120 


269 


hypothetical protein 


SAG2121 


i 223 


hypothetical protein 


SAG2122 


223 


DNA-binding response regulator 


SAG2123 


454 


sensor histidine kinase 


SAG2124 


517 


membrane protein, putative 
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able 1: Complete list of GBS predicted genes 



ORF 


Size 
(a. a.) 


Annotati n 


SAG2125 


308 


carbamate kinase 


SAG2126 


332 


ornithine carbamovltransferase 


SAG2127 


431 


sensor histidine kinase 


SAG2128 


277 


resDonse regulator 


SAG2129 


240 


amino acid. ABC transDorter. ATP-bindine nrotein 


SAG2130 


504 


amino acid ABC transporter amino acid-bindinp nrotein/nermease 
protein 


SAG2131 


847 


membrane protein, putative 


SAG2132 


247 


conserved hvnothetical nrotein 


SAG2133 


118 


conserved hypothetical protein 


SAG2134 


772 


membrane nrotein outative 


SAG2135 


179 


transcriptional regulator TetR familv nutative 


SAG2136 ! 


98 


conserved hvnothetical nrotein 


SAG2137 


203 


ribosomal nrotein S4 


SAG2138 


95 


conserved hvnothetical nrotein 

UUUuwl VWU 11 V UU Ulv UvCU Lsl\/XwXXl 


SAG2139 


451 

~«/ X 


renlicative DNA helicase 

X W|S1XV/UU V w MSx 1 xX. llwllwOOw 


SAG2140 


150 


ribosomal nrotein T-9 

X X UVBWU IPX Jk/X KJ Ivlll M—i^r 


SAG2141 


660 


T^TTTT fairiilv nrotein 

1/1111 XCUXXll Jr LyXV/twXXX 


SAG2142 


613 


ohieo^e inhibited division nrotein A 

f^X UvV/ 3v XXX11X Ux IM1 VJ1 V ^/l W t l^lll IX. 


SAG2143 


203 


membrane nrotein nutative 

XXXwXXX V/X CUlv Lj/X V Iwlllj u ICXl 1 v w 


SAG2144 


373 


tRNA ^S-niethvlaniinOinethvl-2-thioiiiHdvlflte^-m 

V1V1 irk \^ XXX\>UXJr ICUXXXXXV/UXwLXXjr X ^< LXXXV/UX xVXjr ICxlv/y XXXwUljf i U CUXvl vi fld w 


SAG2145 


222 


L-serine dehydratase, iron-sulfiir-dependent, beta subunit 


SAG2146 


290 


T .—serine deb vdrataQP iron-<;n1 "fiir-YiPnendpnt nlnViA Qiihimit 

J-i av/l X1XW UVUjUl uUlaV) XI Is 11 OUilUl^lwpvUUUllj CLI^JXICL oLXIJUXXll 


SAG2147 


234 


nTotein of unknown mnction/linonrotein nntative 

Lyxv/kwxxx xjx txxxxxxiv/ wxi luiiviiuii/ iiys\s^si \j i^iii 9 puiuiiYb 


SAG2148 


179 


LvslS/f domain nrotein 

laSjriSivx \x miw xxx piviviii 


SAG2149 


264 


cobalt transnort familv nrotein 

vuucui n vi i upvi ti xcixxxxxy jl»x\j iwiii 


SAG2150 


280 


ARC Iran^TiOTter AT"P-binHin<y "nrotein 

jXUV> U CUli) U vl Ivl ) iX X X L/l 1 Hilt UlV/lwlll 


SAG2151 


279 


ABC transnort er ATP-bindinc? nrotein 


SAG2152 


180 


CDP-diacvl el vcerol— lvcerol-^-nhosnhate 1 - 

v^x^x umvj ifiij vvi x/x gij vvi ui -J l/xxv/ol/xxcllw -J 

phosphatidyltransferase 


SAG2153 


427 


peptidase, Ml 6 family 


SAG2154 


414 


conserved hvnothetical nrotein 


SAG2155 


117 


conserved hypothetical protein 


SAG2156 


369 


recF protein 


SAG2157 


278 


transporter, putative 


SAG2158 


220 


transcriptional regulator, Cro/CI family 


SAG2159 


493 


ino sine- 5 -monophosphate dehydrogenase 


SAG2160 


161 


transcriptional regulator, ArgR family 


SAG2161 


226 


transcriptional regulator, Crp/Fnr family 


SAG2162 


234 


conserved hypothetical protein 


SAG2163 


410 


arginine deiminase 


SAG2164 


136 


acetyltransferase, GNAT family 


SAG2165 


337 


ornithine carbamoyltransferase 


SAG2166 


475 


arginine/ornithine antiporter 


SAG2167 


| 318 


carbamate kinase 


SAG2168 


341 


tryptophanyl-tRNA synthetase 


SAG2169 


230 


membrane protein, putative 


SAG2170 


290 


conserved hypothetical protein 
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able 1: Complete list of GBS predicted genes 



ORF 


Size 
(a.a.) 


Annotati n 


SAG2171 


539 


ABC transporter, ATP-binding protein 


SAG2172 


859 


ABC transporter, permease protein, putative 


SAG2173 


159 


conserved hypothetical protein TIGR00246 


SAG2174 


409 


serine protease 


SAG2175 


257 


partitioning protein, ParB family 
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Table 2 



ORF 


Size 
<aa) 


Signal 
Peptide 


Sortase 
motif 


Lipo- 
protein 


Other 


Western 
blot 


FACS 


GBS 
specific 


Annotation | 


SAGOO 17 


447 


+ 














)csB 


SAG0031 


299 


+ 














peptidase, M23/M37 family 


SAG0032 


434 


+ 










+ 




group B streptococcal surface immunogenic protein 


SAG0034 


438 


+ 




+ 




+ 


+ 




sugar ABC transporter, sugar-binding protein 


SAG005I 


126 


+ 








+ 


+ 




t/iORN motif family protein 


SAG0079 


212 








+ 


+ 


+ 




adenylate kinase 


SAG0086 


85 






+ 








+ 


lipoprotein, putative 


SAG0093 


250 


+ 








+ 


+ 




D-alanyl-D-alanine carboxy peptidase family protein 


SAG0094 


191 


+ 














N-acctylmuramoyl-L-alanine ami das c, family 4 protein 


SAGOI08 


308 


+ 














conserved hypothetical protein 


SAG0114 


322 






+ 










nbose ABC transporter, periplasmic D-nbose-bindmg 
protein 


SAG0124 


356 
















sensor histidine kinase 


SAG0132 


294 


+ 










+ 




SPFH domain/Band 7 family protein 


SAG0134 


96 


+ 












+ 


hypothetical protein 


SAG0146 


395 


+ 














penicillin-binding protein 4, putative 


SAG0147 


411 


+ 














D-alanyl-D-alanine carboxypeptidase family protein 


SAG0148 


551 




- 


+ 




+ 


- 




oligopeptide ABC transporter, substrate-binding protein, 
putative 


SAG0166 


123 


+ 














conserved domain protein 


SAG0176 


94 


■ + 














conserved hypothetical protein 


SAGO 187 


542 


+ 




+ 






+ 




oligopeptide ABC transporter, ohgopeptide-binding 
protein 


SAG0206 


6<J 






+ 








+ 


lipoprotein, putative 


SAG0213 


39 


+ 












+ 


hypothetical protein 


SAG0231 


135 


+ 














hypothetical protein 


SAG0242 


308 






+ 










amino acid ABC transporter, ammo acid-bindmg protein 














+ 




+ 


protein of unknown function/lipoprotein, putative 


SAG0255 


315 


+ 














conserved hypothetical protein 


SAG0257 


53 






+ 








+ 


lipoprotein, putative 

< 


SAG0265 


235 


+ 








+ 




+ 


conserved hypothetical protein 


SAG0290 


27C 


+ 








+ 


+ 




ABC transporter, substrate-binding protein 


SAG029S 


75C 


+ 














penicillin-binding protein 1A 



1 




Table 2 



• 

ORF 


Size 
(aa) 


Signal 
Peptide 


Sortase 
motif 


Lipo- 
protein 


Other 


Western 
blot 


FACS 


GBS 
specific 


Annotation 


SAG 0306 


535 


+ 














KH domain protein 


SAG032I 


339 


+ 














sensor histidine kinase, putative 


SAG0329 


106 


+ 














PTS system, ceHobiose-specific HB component 


SAG0368 


435 


+ 








+ 


+ 




protein of unknown function 


SAG0371 


167 


+ 












+ 


hypothetical protein 


SAG0383 


334 


+ 




+ 




+ 






protein of unknown function/lipoprotein, putative 


SAG0392 


521 


+ 


+ 






+ 


+ 




cell wall surface anchor family protein 


SAG0394 


345 








+ 








sensor histidine kinase 


SAG040S 


347 


+ 




+ 




+ 


+ 




protein of unknown function/lipoprotein, putative 


SAG0406 


299 


+ 














UTP-glucose-1 -phosphate uridylyltransferase 


SAGO407 


338 
















glycero 1-3 -phosphate dehydrogenase (NAD(P>*-) 


SAG0416 


1233 


+ 


+ 






+ 


+ 




protease, putative 


SAG0421 


1055 




+ 






+ 






cell wall surface anchor family protein 


SAG0433 


1389 




+ 












surface protein Rib 


SAG0437 


123 






+ 










lipoprotein, putative 


SAG 045 1 


149 


+ 




+ 








+ 


bacteriocin transport accessory protein,putative 


SAG045S 


357 


+ 














conserved hypothetical protein 


SAG0472 


126 


+ 








+ 






rhodanese-like family protein 


SAG0482 


84 


+ 














YGGT family protein 


SAG0499 


275 








+ 








hemolysin A 


SAG05U3 


279 


+ 








+ 


+ 




lipase/acylhydrolase ! 


SAG0504 


200 


+ 










• 




conserved hypothetical protein 


SAG 0506 


65 














+ 


hypothetical protein 


SAU0521 


236 


+ 














carboxymethylenebutenolidaso-related protein 


5AOU535 


506 










+ 






zinc ABC transporter, zinc-bind ing adhesion 1 (protein 


5>ACjU596 


o/C 








+ 








prophage LambdaSal, pblA protein, internal deletion 


oAOUoOj 


111 








+ 








conserved hypothetcal protein 


SAG0604 


239 








+ 








prophage LambdaSal , lysin, putative 


SAG0617 


439 








+ 








sensor histidine kinase VncS 


SAG0624 


574 


+ 














septation ring formation regulator EzrA, -putative 


SAG0629 


354 


+ 














conserved domain protein 


SAG063S 


245 


+ 








+ 






acid phosphatase, class B 


SAG0638 


109 


+ 














celt wall surface anchor family protein, intcmiption-N 



2 



Table 2 



ORF 


Size 
(aa) 


Signal 
Peptide 


Sortase 
motif 


Lipo- 
protein 


Other 


Western 
blot 


FACS 


GBS 
specific 


Annotation 


SAG0645 


554 




+ 






+ 


+ 




cell wall surface anchor family protein 


SAG0646 


307 


+ 


+ 






+ 






cell wall surface anchor family protein 


SAG0647 


305 
















sortase family protein 


SAGO 649 


890 




+ 






+ 


+ 




cell wall surface anchor family protein, putative 


SAG 065 8 


383 


+ 




+ 










lipoprotein, putative 


SAG0675 


171 


+ 














putative secreted protein 


SAG0676 


885 








+ 








proteinase, putative 


SAG0677 


1062 




+ 












hypothetical protein 


SAG0679 


343 


+ 




+ 




+ 






protein of unknown function 


SAG0680 


339 


+ 














protein of unknown function 


SAG0681 


353 


+ 














conserved domam protein 


SAG0686 


261 


+ 








+ 


+ 




DNA-entry nuclease, putative 


SAG0714 


188 


+ 












+ 


conserved hypothetical protein 


SAG0717 


266 


+ 








+ 


+ 




amino acid ABC transporter, amino acid-binding protein 


SAG0720 


449 
















sensory box histidme kinase 


SAG0738 


132 


+ 














conserved hypothetcal protein 


SAG0739 


143 
















conserved hypothetcal protein 


SAG0742 


428 








+ 


+ 


+ 




peptidase, U32 family 


SAG0755 


282 


+ 














peptidase, U32 family 


SAG0757 


129 


+ 




+ 




+ 






protein of unknown function/lipoprotein, putative 


SAG0764 


230 








+ 




+ 




phosphogly cerate mutase family protein 


oAG07o5 


681 


+ 














penicillin-binding protein 2b 


5AG0771 


512 


+ 


+ 






+ 


+ 


+ 


cell wall surface anchor family protein 


5AO0776 


276 


+ 




+ 










YacC family protein, putative 


&AG0777 












+ 


+ 




ATP-dependentRNA hehcase, DEAD/DEAH box family 


OAPA70C 

SAG0785 


33C 


+ 














conserved hypothetcal protein 


c a pnono 
SAG0808 


309 


+ 




+ 




+ 


+ 




protease maturation protein, putative 


SAG0824 


417 
















polysaccharide deacetylase family protein 


SAG0832 


753 


+ 








+ 


+ 




protein of unknown function 


SAG0833 


181 


+ 














hypothetical protein 


SAG0867 


63 


+ 














conserved hypothetcal protein 


SAG0868 


285 


+ 








+ 






DNA-entry nuclease 


SAG0886 


319 












+ 




protein of unknown function 
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Table 2 



ORF 


Size 
(aa) 1 


Signal 
>eptidc 


Sortasc 
motif 


Lipo- 
protein 


Other 


Western 
blot 1 


-ACS 


GBS 
specific t 


Annotation 


SAG0904 


56 


+ 












+ \ 


lypotheticat protein 


SAG0907 


877 


+ 




+ 




+ 




I 


jrotein of unknown function/1 ipoprotein, putative 


SAG0926 


333 


+ 












1 


rn916, NLP/P60 family protein 


SAG0942 


185 










+ 


+ 


5 


ttgnal pepudase 1, putative 


SAG0949 


276 


+ 








+ 


+ 




imino acid ABC transporter, amino acid-binding protein 


SAG0954 


349 






+ 




+ 






protein of unknown function/Iipoprotein, putative 


SAG0961 


247 










+ 


- 




sortase SrtA 


SAG0963 


320 


+ 














conserved hypothetical protein 


SAG0971 


282 






+ 




+ 






protein of unknown function/Iipoprotein, putative 


SAG0973 


320 


+ 












+ 


nisin-resistance protein, putative 


SAG0977 


312 








+ 








sensor histidine kinase 


SAG0979 


553 


+ 




+ 




+ 


* 




ABC transporter, substrate-binding protein 


SAG0984 


437 


+ 














sensor histidine kinase CiaH 


SAG0992 


286 


+ 




+ 




+ 


+ 




phosphate ABC transporter, phosphate-binding protein 


SAGI007 


342 


+ 




+ 




+ 






iron-compound ABC transporter, iron-compound-binding 
protein 


SAG1014 


190 


+ 








* 


- 




conserved hypothetical protein 


SAG1018 


40 






■f 








+ 


lipoprotein, putative 


SAG1024 


183 


+ 




+ 










lipoprotein, putative 


SAG1029 


101 


+ 














hypothetical protein 


SAG1030 


304 


+ 








+ 


+ 




protein of unknown function 


SAG1037 


157 


+ 












+ 


hypothetical protein 


SAG1052 


47 




+ 












cell wall surface anchor family protein, putative 


SAG1072 


20G 


+ 














conserved hypothetcal protein 


SAGI094 


27fi 








+ 


+ 


+ 




conserved hypothetcal protein 


SAG! 108 


351 


+ 








+ 






spermidme/putrescine ABC transporter, 
spermidine/putrescine-mndtng prot 


SAG1 121 


29S 


+ 














polysaccharide deacetylase family protein 


SAG1126 


22* 


+ 








+ 


+ 




protein of unknown function 


SAG 1127 


44( 


> + 












+ 


conserved domain protein 


SAGI130 


4S 


) + 












+ 


hypothetical protein 


SAG1 138 




\ + 














conserved hypothetcal protein 


SAG1139 


19: 


\ + 














conserved hypothetcal protein 
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Table 2 



ORF 


Size 
<aa) 


Signal 
>eptide 


Sortase 
motif 


Lipo- 
protein 


Other 


Western 
blot ) 


FACS 


GBS 
specific / 


Annotation 


SAGI 149 


207 


+ 




+ 








1 


ipoprotein, putative 


SAGI 184 


236 
















conserved hypothetical protein 


SAG 11 86 


553 








+ 








netallo-beta-lactamase superfarmly protein 


SAG 11 89 


334 
















unserved hypothetca! protein 


SAG 1190 


551 
















adherence and virulence protein A 


SAGI 197 


1072 


+ 














hyaluronidase 


SAG1201 


367 


+ 














uninodiacetate oxidase, putative 


SAG 1206 


854 


+ 














conserved domain protein 


SAG1214 


58 


+ 














hypothetical protein 


SAG1216 


1252 




+ 






+ 


- 




puHulanase, putative 


SAG1227 


198 


+ 








+ 






protein of unknown function 


SAG1233 


822 


+ 








+ 






streptococcal histtdine triad family protein 


SAG1234 


306 


+ 




+ 




+ 


+ 




laminin-binding surface protein 


SAG1238 


202 
















hypothetical protein 


SAG1283 


1631 




+ 






+ 


+ 




agglutinin receptor 


SAG1313 


56 


+ 














conserved hypothetical protein 


SAG1327 


409 
















sensor histidine kinase 


SAG1331 


979 


+ 


+ 






+ 






R5 protein 


SAG1333 


690 




+ 






+ 


+ 




5 '-nucleotidase family protein 


SAG1350 


544 


+ 












t 


surface antigen-related protein 


SAG136I 


414 
















conserved hypothetical protein 


SAG1371 


392 


+ 














conserved hypothctcal protein 


SAG1393 


310 






+ 










iron compound ABC transporter, substrate-binding protein 


SAG1404 


308 


+ 


+ 






+ 






cell wall surface anchor family protein 


SAG1405 


294 


+ 






+ 


+ 


+ 




sortase family protein 


SAG1406 


293 


+ 














sortase family protein 


SAG1407 


705 


+ 


+ 








+ 




cell wall surface anchor family protein 


SAGI408 


901 
















cell wall surface anchor family protein 


SAG1419 


$71 






+ 








+ 


lipoprotein, putative 


SAG143I 


26£ 






+ 










amino acid ABC transporter, amino acid-binding protein 


SAG1433 


375 


+ 














conserved hypothetical protein 


SAG1441 


41< 










+ 


+ 




maltose/maltodextnn ABC transporter, 
maltose/maltodextrin-binding protein 



Table 2 



ORF 


Size 
(aa) I 


Signal 
'eptide 


Sortase 
motif 


Lipo- 
protein 


Other 


Western 
blot 1 


FACS 


GBS 
specific / 


Annotation 


SAG1462 


970 




+ 










c 


;ell wall surface anchor family protein 


SAG 1473 


192 


+ 












+ 


;ell wall surface anchor family protein 


SAG1474 


680 


+ 


+ 












unidase family protein 


SAG1483 


78 


+ 














jrcprotein translocase, SecG subunit 


SAG 1 488 


195 


+ 














iephospho-CoA kinase 


SAG1491 


530 


+ 














hypothetical protein 


SAG1508 


590 








+ 


+ 


- 




57 kDa Myosin-crossreactive streptococcal antigen 


SAG IS 18 


538 


+ 




+ 










peptide ABC transporter, peptide-binding protein 


SAG 1530 


267 


+ 




+ 




+ 


- 




peptidyl-protyl cis-trans isomerase, cyclophfl in-type 


SAG1533 


308 


+ 














manganese ABC transporter, manganese-binding adhesion 
liprotein 


SAG1544 


232 


+ 














gluconate 5-dehydrogenasc, putative 


SAG 1551 


67 


+ 












+ 


hypothetical protein 


SAG1552 


719 


+ 














conserved hypothetcal protein 


SAG1553 


477 


+ 












+ 


hypothetical protein 


SAG1562 


280 


+ 














conserved hypothetcal protein 


SAG1582 


388 


+ 








+ 


- 




branched-chain amino acid ABC transporter, ammo acid- 
binding protem 


SAG1590 


449 








+ 


+ 


+ 




potassium uptake protem, Trk family 


SAG1601 


79 


+ 














conserved hypothetcal protem 


SAG 1610 


285 






+ 




+ 


• 




amino acid ABC transporter, substrate-binding protem 


SAG1618 


1032 








+ 


+ 


+ 




Snf2 family protein 


SAG 1624 


501 


+ 














sensor histidine kinase CsrS 


SAG1628 


184 


+ 














IcmA protein 


SAG1631 


223 


+ 








+ 


* 




potassium uptake protem, Trk family, putative 


SAG1641 


274 


+ 








+ 


- 




YaeC family protein 


SAG 1642 


27*3 


+ 














ABC transporter, substrate-binding protein 


SAG 1 683 


512 


+ 














immunogenic secreted protem, putative 


SAG 1706 


23£ 


+ 














conserved hypothetical protein 


SAG1745 


14f 


+ 












+ 


hypothetical protein 


SAG 1752 


39( 


+ 














conserved hypothetcal protein TIGR00275 


SAG 1759 


23( 








+ 


+ 


+ 




protein of unknown function 


SAG1762 


16< 


+ 














conserved hypothetcal protein 
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Table 2 



ORF 


S ize ! 
(aa) F 


Signal i 
'eptide 


Sortase 
motif 


Lipo- 
protein < 


Dthcr 


Western 
blot I 


•ACS 


GBS 
specific / 


Annotation 

J 


SAG1767 


289 


+ 




+ 








a 


cid phosphatase 


SAG 1768 


336 










+ 


+ 


1 


,lyceraldehyde 3-phosphate dehydrogenase 


SAG 1774 


424 


+ 












c 


unserved hypothetical protein 


SAG 1786 


130 


+ 








+ 


• 


1 


>rotein of unknown function 


SAG1787 


420 














( 


lltD protein 


SAG1791 


395 
















sensor histidine kinase 


SAG1822 


272 


+ 








+ 


- 




3 rote in of unknown function 


SAG1823 


418 








+ 


+ 


+ 




protein of unknown function 


SAG 1837 


468 








+ 








prophage LambdaSa2, lysin, putative 


SAG 1838 


109 


+ 














prophage LambdaSa2, holin, putative 


SAG1839 


136 


+ 














conserved hypothetical protein 


SAG1842 


1224 








+ 








prophage LambdaSa2, PblB, putative 


SAG1912 


194 


+ 














N-acetylmuramoyl-L-alanine amidase, family 4 protein 


SAG 1921 


508 


+ 






*» 








sensor histidine kinase 


SAG 1932 


816 


+ 














neuraminidase-related protein 


SAG1938 


307 


+ 




+ 




+ 


- 




adhesion lipoprotein 


SAG1941 


800 


+ 


+ 






+ 


- 




2' 3' -cychc-nucleoude 2'-phosphodiestcrase 


SAG 1945 


345 


+ 














iron ABC transporter, iron-binding protein 


SAG1947 


549 








+ 








conserved hypothetcal protein 


SAG1960 


551 








+ 


+ 


+ 




sensor histidine kinase 


SAG1966 


293 






+ 




+ 


* 




hemolysin precursor, putative 


SAG1996 


263 


+ 














cell wall surface anchor family protein, putative 


SAG1997 


182 


+ 














hypothetical protein 


SAG 1998 


45*3 


+ 














hypothetical protein 


SAG2021 


82< 
















cell wall surface anchor family protein 


SAG2043 

o 


25« 


+ 














cAMP factor 


SAG2053 


157C 


> + 














serine protease, subtilase family, putative 


SAG2055 


46: 


I 






+ 








sensor histidine kinase 


SAG20S6 


20; 


I + 












+ 


chromosome assembly-related protein 


SAG2063 


63( 


) + 


+ 












pathogenicity protein, putative 


SAG2078 


32< 


) + 








+ 






protein of unknown function/Iipoprotein, putative 


SAG2094 




+ 








+ 


+ 




compctcnce/darnage-inducible protein CinA, authentic 
frameshift 
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Table 2 



7 x-cies&os 





Size 


Signal 


Sorlase 


Lipo- 




Western 




GBS 




ORF 


(aa) 


Peptide 


motif 


protein 


Other 


blot 


FACS 


specific 


Annotation 


SAG2121 


223 


+ 












+ 


hypothetical protein 


SAG2123 


454 


+ 














sensor histtdine kinase 


SAG2141 


660 


+ 








+ 






DHH family protein 


SAG2147 


234 


+ 




+ 




+ 


+ 




protein of unknown function/lipoprotein, putative 


SAG2148 


179 
















LysM domain protein 


SAG2174 


409 
















serine protease 


SAG0013 


428 


+ 














protein of unknown function 
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Table 3 



ORF 


Ann tation 


SAG0038 


conserved hypothetical protein 


SAG0048 


transcriptional regulator Cro/CI family 


SAG0091 


transcriptional regulator ComXl putative 


SAG0137 


conserved hypothetical protein 


SAG0686 


DNA-entry nuclease putative 


SAG0770 


membrane protein putative 


SAG0868 


DNA-entry nuclease 


SAG1143 


conserved hypothetical protein 


SAG1233 


streptococcal histidine triad family protein 


SAG1596 


integrase/recombinase phage integrase family 


SAG1616 


conserved hypothetical protein 


SAG1721 


conserved hypothetical protein. 
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TableS 



Strain 


Source 


Capsular serotype 


Reference 1 


090 


Lrancefield 


la 




515 


Houston 


la 


0) 


A909 


'. ^ancefield 


la 


(2) 


Davis 


Channing 


la 




DK1 


Houston 


la 




DK8 


Houston 


la 




H36b 


Lancefield 


lb 


(2) 


(S7) 7357b 


Channing 


lb 


(3) 


18RS21 


Lancefield 


n 


(4) 


DK21 


Houston 


II 




COH1 


Seattle 


ra 


(5) 


COH31 


Seattle 


ra 


(6) 


D136C 


Lancefield 


ni 


(4) 


M781 


Houston 


in 


(7) 


M732 


Houston 


hi 


(8) 


1169NT1 


Atlanta 


V 


(9) 


2603V/R 


Italy 


V 


This study 


CJB111 


Houston 


V 


(10) 


JM91 30013 


Japan 


vni 


(11) 


SMU014 


Japan 


vm 


(11) 


CJB110 


Houston 


Nontypeable 


(12) 
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Cluster 1 

SAG0230 
SAG0231 
SAG0232 
SAGQ233 
SAGQ234 
SAGQ235 



Table 6 

conserved hypothetical protein 
hypothetical protein 
hypothetical protein 
hypothetical protein 
hypothetical protein 
hypothetical protein 



Cluster 2 

SAGQ222 conserved domain protein 

S AG0223 conserved hypothetical protein, fusion 

SAG0225 hypothetical protein 

SAG0226 recombination protein 

SAG0227 hypothetical protein 

SAG0228 conserved hypothetical protein 

SAGQ229 conserved hypothetical protein 



Cluster 3 

SAG0634 hypothetical protein 

SAG0635 acid phosphatase, class B 

SAG0636 conserved hypothetical protein 

SAG0638 cell wall surface anchor family protein, intermption-N 

SAG0640 transposase OrfA, IS3 family 



1 



SAG0642 

SAG0643 

SAG0644 

SAG0645 

SAG0646 

SAG0647 

SAG0648 

SAG0649 

SAG0650 

SAG0651 

Cluster 4 

SAG1898 
SAG1899 
SAG1900 
SAG1901 
SAG1902 
SAG1905 
SAG1906 

Cluster 5 
SAG0247 
SAG0248 



Table 6 

hypothetical protein 
chaperonin, 33 kDa, degenerate 
transcriptional regulator, AraC family 
cell wall surface anchor family protein 
cell wall surface anchor family protein 
sortase family protein 
sortase family protein 

cell wall surface anchor family protein, putative 
sortase family protein 
protein of unknown function 



FTS system, IID component 
PTS system, IIC component 
PTS system, IIB component 
glucuronyl hydrolase 
PTS system, HA component 
conserved hypothetical protein 
carbohydrate kinase, PfkB family 



hypothetical protein 
hypothetical protein 



2 



SAG0249 

SAG0674 

SAG0675 

SAG0676 

SAG0677 

SAG068O 

SAG06S1 

SAG0684 

SAG1698 

Cluster 6 

SAG0261 
SAG0262 
SAG0965 
SAG0966 
SAG20O2 

Cluster 7 

SAG1027 
SAG1028 
SAG 1029 
SAG 1030 
SAG 1031 




Table 6 

hypothetical protein 

hypothetical protein 

putative secreted protein 

proteinase, putative 

hypothetical protein 

protein of unknown function 

conserved domain protein 

ABC transporter, ATP-binding protein 

conserved hypothetical protein 



IS 1381, transposase OrfB 
IS 1381, transposase OrfA 
IS 1381, transposase OrfA 
IS 1381, transposase OrfB 
IS 138 1 , transposase OrfB 



conserved hypothetical protein 
hypothetical protein 
hypothetical protein 
protein of unknown function 
conserved domain protein 



SAG 1032 

Cluster 8 

SAG 1253 
SAG 1254 
SAG1255 
SAG2022 
SAG2023 
SAG2024 

Cluster 9 

SAG1993 

SAG1994 

SAG1995 

SAG1996 

SAG1997 

SAG1998 

SAG2000 

SAG2001 

SAG2007 

SAG2008 

SAG2009 

SAG2010 




Table 6 

conserved hypothetical protein 



transposase, ISL3 family 
mercuric reductase 

mercuric resistance operon regulatory protein MerR 
transposase, ISL3 family 
mercuric reductase 

mercuric resistance operon regulatory protein MerR 



site-specific recombinase, phage integrase family 
conserved hypothetical protein 
hypothetical protein 

cell wall surface anchor family protein, putative 

hypothetical protein 

hypothetical protein 

membrane protein, putative 

conjugal transfer protein, interruption-C 

conserved hypothetical protein 

conserved hypothetical protein 

conserved hypothetical protein 

hypothetical protein 



4 



SAG201 1 
SAG2012 
SAG2016 
SAG2017 
SAG2025 

Cluster 10 

SAG 1039 

SAG 1447 

SAG1448 

SAG 1449 

SAG1450 

SAG1452 

SAG1453 

SAG1454 

SAG1455 

SAG 1456 

SAG1459 

SAG 1460 

SAG1461 

SAG1462 

SAG1463 

SAG1469 
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Table 6 

conserved hypothetical protein 

hypothetical protein 

hypothetical protein 

transcriptional regulator, Cro/CI family 

Mn2+/Fe2+ transporter, NRAMP family 



conserved hypothetical protein 

conserved hypothetical protein 

glycosyl transferase, group 1 family protein 

preprotein translocase SecA subunit, putative 

conserved domain protein 

conserved hypothetical protein 

preprotein translocase SecY family protein 

glycosyl transferase, putative 

glycosyl transferase, group 2 family protein 

glycosyl transferase, family 8, degenerate 

glycosyl transferase family 8 

glycosyl transferase, family 8 

conserved hypothetical protein 

cell wall surface anchor family protein 

transcriptional regulator, Rof A family, authentic point mutation 
conserved hypothetical protein 



5 



SAG 1471 
SAG 1933 

Cluster 11 

SAG0009 

SAG0120 

SAG0157 

SAG0186 

SAG0216 

SAGQ236 

SAG0307 

SAGO308 

SAG0311 

SAG0518 

SAG0553 

SAG0555 

SAG0564 

SAG0579 

SAG0580 

SAG0611 

SAG0637 

SAG0641 

SAG0652 




Table 6 

conserved hypothetical protein 
PTS system, 1IC component, putative 



hypothetical protein 
hypothetical protein 

deoxyribonuclease-related protein, degenerate 

hypothetical protein 

hypothetical protein 

hypothetical protein 

hypothetical protein 

ABC transporter, ATP-binding protein 

DNA-binding response regulator, authentic point mutation 

peptide chain release factor 2, programmed ftameshift 

hypothetical protein 

prophage LambdaSal, antirepressor, putative 
conserved hypothetical protein 
conserved hypothetical protein 
conserved hypothetical protein, truncation 
transposase, degenerate 

transcriptional regulator, TetR family, putative, authentic ftameshift 
Tn5252, Orf 10 protein, degenerate 
Tn5252, Orf 28 protein, degenerate 
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Table 6 



SAG0655 


conserved hypothetical protein 


SAG0678 


endopeptidase O, degenerate 


SAG0683 


transmembrane protein Vexp3, putative, degenerate 


SAG0855 


glycogen biosynthesis protein GlgD, authentic firameshift 


SAG0898 


hypothetical protein 


SAG0899 


hypothetical protein 


SAG0901 


hypothetical protein 


SAG0902 


hypothetical protein 


SAG0903 


hypothetical protein 


SAG0917 


Tn916, hypothetical protein 


SAG0920 


Tn916, hypothetical protein 


SAG0922 


Tn916, hypothetical protein 


SAG0924 


Tn9 16, tetM leader peptide 


SAG0928 


Tn916, hypothetical protein, authentic frameshift 


SAG0936 


Tn916, hypothetical protein 


SAG0943 


hypothetical protein 


SAG0972 


conserved hypothetical protein, authentic fiameshift 


SAG1023 


hypothetical protein 


SAG1080 


hypothetical protein 


SAG 1123 


hypothetical protein 


SAG1129 


hypothetical protein 


SAG1136 


conserved hypothetical protein 


SAG1217 


conserved hypothetical protein, authentic ftameshift 
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Table 6 



SAG 1231 


transposase OrfB, 1S3 family, degenerate 


O A 1 *\ A^% 

SAG 1242 


transposase OrfB, IS3 family, truncation 


O Am *3 AO 

b AG 130V 


hypothetical protein 


oAG 133 1 


R5 protein 


SAG 1437 


hypothetical protein 


O A 1 A A C 

SAG 1445 


MutT/nudix family protein, authentic trameshift 


SAG1484 


ribosomal protein L33 


SAG1493 


hypothetical protein 


SAG 1539 


hypothetical protein 


SAG 1 543 


conserved hypothetical protein, authentic trameshift 


SAG 1560 


hypothetical protein 


SAG 1368 


phosphoserine aminotransferase, authentic frameshift 


SAG 1570 


conserved hypothetical protein 


oAvjloUl 


conserved hypothetical protein 


CAr.l JC/t A 

oALrl044 


hypothetical protein 


o/\VJ 1 040 


hypothetical protein 


SAG1699 


hypothetical protein 


SAG1705 


peptidase, M24 family, authentic point mutation 


SAG 1708 


hypothetical protein 


SAG1857 


prophage LambdaSa2, HNH endonuclease family protein 


SAG1864 


hypothetical protein 


SAG 1868 


hypothetical protein 
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Table 6 

SAG 1 869 prophage LambdaSa2, type II DNA modification methy ltransferase, 
putative 

SAG 1872 hypothetical protein 

SAG 1874 hypothetical protein 

SAG 1 876 prophage LambdaSa2, HNH enddnuclease family protein 

SAG 1 878 conserved domain protein 

SAG 1881 hypothetical protein 

SAG1883 conserved hypothetical protein 

SAG 1 886 hypothetical protein 

SAG 1 903 hypothetical protein 

SAG1937 streptococcal histidine triad family protein, degenerate 

SAG 197 1 hypothetical protein 

SAG 1 979 membrane protein, putative 

SAG 1980 ABC transporter, ATP-binding protein 

SAG 198 1 hypothetical protein 

SAG 1 982 transcriptional regulator, Cro/CI family 

SAG 1 983 conserved hypothetical protein 

SAG 1984 conserved hypothetical protein TIGR00730 

SAG 1985 hypothetical protein 

SAG 199 1 transcriptional regulator, Cro/CI family 

SAG1992 protein of unknown function 

SAG 1 999 hypothetical protein 

S AG2004 conjugal transfer protein, interruption-N 
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SAG2039 
SAG2044 
SAG2052 
SAG2065 
SAG2094 
SAG2099 

Cluster 12 

SAG1164 

SAG1165 

SAG1166 

SAG1167 

SAG 1168 

Cluster 13 

SAG0581 

SAG0582 

SAG0583 

SAG0585 

SAG0586 

SAG0587 

SAG0588 

SAG0589 



Table 6 

conserved hypothetical protein 
hypothetical protein 
hypothetical protein 
ribosomal protein L33 

competence/damage-inducible protein CinA, authentic frameshift 
hypothetical protein 



glycosyl transferase CpsJ(V) 
glycosyl transferase CpsO(V) ' 
glycosyl transferase CpsN(V) 
polysaccharide biosynthesis protein CpsM(V) 
polysaccharide biosynthesis protein cpsH(V) 



conserved hypothetical protein 

conserved hypothetical protein 

conserved hypothetical protein 

conserved hypothetical protein 

conserved hypothetical protein 

prophage LambdaSal, structural protein, putative 

conserved hypothetical protein 

conserved hypothetical protein 
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Table 6 

SAG0590 
SAG0591 
SAG0593 
SAG0594 
SAG0595 
SAG0596 



conserved hypothetical protein 
conserved hypothetical protein 
prophage LambdaSal , structural protein 
conserved hypothetical protein 
conserved hypothetical protein 
prophage LambdaSal, pblA protein, internal deletion 



Cluster 14 

S AG09 1 5 Tn9 1 6, transposase 

SAG091 8 Tn9 1 6, hypothetical protein 

S AG09 1 9 Tn9 1 6, hypothetical protein 

SAG0921 Tn916, transcriptional regulator, putative 

S AG0925 Tn91 6, hypothetical protein 

SAG0926 Tn916, NLP/P60 family protein 

SAG0927 membrane protein, putative 

S AG0929 Tn9 16, hypothetical protein 

SAG0930 Tn9 16, hypothetical protein 

S AG093 1 Tn9 1 6, hypothetical protein 

SAG0932 Tn916, transcriptional regulator, putative 

SAG0933 Tn916, FtsK/SpoIIlE family protein 

SAG0934 Tn91 6, hypothetical protein 

SAG0935 Tn9 1 6, hypothetical protein 

S AG0937 ABC transporter, ATP-binding protein, authentic firameshift 
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Table 6 



Cluster 15 




SAG1835 


conserved hypothetical protein 


SAG1837 


prophage LambdaSa2, lysin, putative 


SAG1839 


conserved hypothetical protein 


SAG1840 


hypothetical protein 


SAG 1842 


prophage LambdaSa2, PblB, putative 


SAG1843 


conserved hypothetical protein 


SAG1844 


conserved hypothetical protein 


SAG 1849 


hypothetical protein 


SAG 1851 


conserved domain protein 


SAG1852 


conserved domain protein 


SAG1853 


prophage LambdaSa2, protease, putative 


SAG1854 


conserved hypothetical protein 


SAG1855 


prophage LambdaSa2, terminase large subunit, putative 


SAG1856 


hypothetical protein 


SAG 1858 


hypothetical protein 


SAG1859 


prophage LambdaSa2, site-specific recombinase, phage integrase family 


SAG 1860 


conserved hypothetical protein 


SAG1861 


prophage LambdaSa2, transcriptional regulator, Cro/CI family 


SAG 1862 


hypothetical protein 


SAG1863 


prophage LambdaSa2, single-strand binding protein 


SAG1865 


conserved hypothetical protein 
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Table 6 

SAG 1 866 conserved hypothetical protein 
SAG 1 867 conserved hypothetical protein 

SAG 1 870 prophage LambdaSa2, DNA replication protein DnaC, putative 
SAG 1 87 1 prophage LambdaSa2, bacteriophage replication protein/hypothetical 
protein, truncation/fusion 

SAG 1873 prophage LambdaSa2, replicative DNA helicase 
SAG 1877 prophage LambdaSa2, antirepressor protein, putative 
SAG 1 879 hypothetical protein 

SAG 1882 prophage LambdaSa2, repressor protein, putative 
SAG 1884 hypothetical protein 

SAG 1885 prophage LambdaSa2, she-specific recombinase, phage integrase family 



Cluster 16 

SAG 1247 site-specific recombinase, phage integrase family 

SAG 1250 Tn5252, relaxase 

SAG 125 1 Tn5252, Orf 9 protein 

SAG 1252 Tn5252, Orf 10 protein 

SAG1256 IS861, transposase OrfB, truncation 

SAG1257 cation-transporting ATPase, E1-E2 family 

SAG 1258 cadmium efflux system accessory protein 

SAG 1259 conserved hypothetical protein 

SAG 1260 hypothetical protein 

SAG 1261 conserved hypothetical protein 
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Table 6 



SAG 1262 


cation-transporting ATPase, E1-E2 family 


SAG 1263 


conserved domain protein, authentic firameshift 


SAG 1264 


transcriptional repressor CopY, putative 


SAG 1265 


cadmium resistance transporter, putative 


SAG1266 


hypothetical protein 


SAG 1267 


hypothetical protein 


SAG 1268 


repressor protein, putative 
• 


SAG 1270 


ImpB/MucB/SamB family protein 


SAG1271 


conserved hypothetical protein 


SAG1272 


conserved hypothetical protein 


SAG 1273 


conserved hypothetical protein 


SAG1274 


conserved hypothetical protein 


SAG 1276 


conserved hypothetical protein 


SAG 1277 


hypothetical protein 


SAG 1278 


hypothetical protein 


SAG1279 


conserved domain protein 


SAG1280 


SNF2 family protein 


SAG1281 


hypothetical protein 


SAG1283 


agglutinin receptor 


SAG 1284 


abortive infection protein AbiGI 


SAG1285 


abortive infection protein AbiGII 


SAG1286 


Tn5252, Or£28 


SAG1287 


Tn5252, Orf26 
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Table 6 



SAG1288 


Tn5252, Orf25, degenerate 


SAG 1289 


Tn5252, Orf23 


SAG 1290 


hypothetical protein 


SAG 1291 


Tn5252, Orf 21 protein, internal deletion 


SAG 1292 


hypothetical protein 


SAG 1293 


protease, putative 


SAG1294 


conserved hypothetical protein 


SAG1295 


conserved hypothetical protein 


SAG1296 


conserved hypothetical protein 


SAG1297 


C-5 cytosine-specific DNA methylase 


SAG1299 


conserved hypothetical protein 


SAG 1304 


hypothetical protein 
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Table 7 



Locus 

Housekeeping 

SAG0466 

SAG0471 

SAG0492 

SAG0767 

SAG1086 

SAG1600 

SAG1680 

SAG1723 



Annotation 

thiolase 
glucokinase 

amino acid ABC transporter, ATP-binding protein 
D-alanine— D-alanine ligase 
xanthine phosphoribosyltransferase 
glutamate racemase 
shikimate 5-dehydrogenase 
• signal peptidase I 



Surface-exposed 




SAG0079 


adenylate kinase 


SAG0093 


D-alanyl-D-alanine carboxypeptidase family protein 


SAG0163 


competence protein Cgl A 


SAGQ290 


ABC transporter, substrate-binding protein 


SAG0368 


protein of unknown function 


SAG0503 


lipase/acylhydrolase 


SAG1473 


cell wall surface anchor family protein 


SAG 1552 


conserved hypothetical protein 


SAG 1641 


YaeC family protein 


SAG2147 


protein of unknown function/lipoprotein, putative 


SAG2148 


LysM domain protein 



Table 8 



ORFxxxxx Annotation 

ORFQQ003 PcsB protein (pscB) _ 

ORF00004 ribose-phosphate pyrophosphokinase (prsA) 

ORF00005 aminotransferase, class I 

ORF00006 recombination protein O m 

ORF00009 fatty acid/phospholipid synthesis protein PIsX (pIsX) 

ORF00011 phosphoribosylaminoimidazole-succinocarboxamide synthase (purC) 

ORF00012 phosphoribosylfbrmylgiycinamidine synthase, putative 

ORF00013 amidophosphoribosyltransferase (purF) 

ORF00014 phosphoribosylfbrmylgiyctnamidine cydo-ligase (purM) 

ORF00015 phosphoribosylglytinamide formy transferase (purN) 

ORF00020 group B streptococcal surface immunogenic protein 

ORF00021 N-acetylmannosamine-6-P epimerase, putative 

ORF00022 sugar ABC transporter, sugar-binding protein 

ORF00023 sugar ABC transporter, permease protein 

ORF0Q024 sugar ABC transporter, permease protein 

ORF00026 conserved hypothetical protein 

ORF00027 N-acetylneuraminate lyase, putative 

ORF00028 expressed ROK family protein 

ORF0003Q phosphosugar-binding transcriptional regulator, RpiR family, putative 

ORF00031 phosphoribosylamine-glycine ligase (purD) 

ORF00032 phosphoribosylaminoimidazole carboxylase, catalytic subunit (purE) - 
ORF00033 phosphoribosylaminoimidazole carboxylase, ATPase subunit (purK) 
ORF00036 adenylosuccinate lyase (purB) 
ORF00037 transcriptional regulator, Cro/Cl family 

ORF00036 Holliday junction PNA helicase RuvB (ruvB) 

ORF00Q39 phosphotyrosine protein phosphatase, low molecular weight 

ORF00040 MORN motif family protein 

ORF00041 membrane protein, putative 

ORF00043 alcohol dehydrogenase, propanol-preferring (adhP) 

QRF00045 MATE efflux family protein 

ORF00046 ribosomal protein S10 (rpsJ) ~ 

ORF00Q47 ribosomal protein L3 (rpIC) 

ORF00048 ribosomal protein L4 (rpID) 

ORF00049 ribosomal protein L23 (rplW) 

ORF00050 ribosomal protein 12 (rplB) 

ORF00052 ribosomal protein S19 (rpsS) 

ORF00054 ribosomal protein L22 (rpIV) 

ORF00055 ribosomal protein S3 (rpsC) 

ORF00056 ribosomal protein L16 (rplP) 

ORF00058 ribosomal protein L29 (rpmC) 

QRF00059 ribosomal protein S17 (rpsQ) 

QRF00060 ribosomal protein L14 (rpIN) 

ORF00061 ribosomal protein L24 (rplX) 

QRF000S3 ribosomal protein L5 (rplE) 

ORF00Q65 ribosomal protein S8 (rpsH) 

ORF00066 ribosomal protein L6 (rplF) 

ORF00068 ribosomal protein L18 (rplR) 

ORF00069 ribosomal protein S5 (rpsE) 

ORF00070 ribosomal protein L30 (rpmD) 

ORF00071 ribosomal protein L15 (rplO) 

ORF00072 preprotein translocase, SecY subunit ; 

ORF00073 adenylat kinase (adk) 

ORF00074 translation initiation factor IF-1 (infA) 

ORF00075 ribosomal protein L36 (rpmJ) m m 

ORF00077 ribosomal protein S13 (rpsM) 
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1ORF00078 ribosomal protein S11 (rpsK) 

|ORF00080 DNA-directed RNA polymerase, alpha subunit (rpoA) 



1ORF00093 transcriptional regulator ComXI, putative 



1ORF00094 phosphoglycerate mutase family protein 



ORF00Q97 heat-inducible transcription repressor HrcA (hrcA) 



|ORF00098 heat shock protein GrpE (grpE) 



pRF00099 dnaK protein (dnaK) 



I ORF00100 dnaJ protein (dnaJ) 

ORF0Q101 transcriptional regulator, GntR family 



1ORF00102 tRNA pseudouridine synthase A (truA) 



1ORF0Q103 phosphomethylpyrimidine kinase, putative 



1ORF00104 conserved hypothetical protein 



I ORF00105 conserved hypothetical protein 



I ORF00106 conserved hypothetical protein 



1ORF00107 trigger factor (tig) 



I ORF00108 DNA-directed RNA polymerase, delta subunit, putative 
ORF00109CTP synthase (pyrG) ~ 



ORFQQ111 deoxyuridine 5"-triphosphate nucleotidohydrolase (dut) 



|ORFQ0113 carbonic anhydrase-related protein 



ORF00115 pyridine nucleotide-disulphide oxidoreductase family protein 



IORF00116 glutamyl-tRNA synthetase (gltX) 



ORF00119 ribose ABC transporter, ATP-binding protein (rbsA) 



1ORF00122 ribose operon repressor RbsR (rbsR) 



IORF00125 ABC transporter, ATP-binding protein 



IORF00126 DNA-binding response regulator 



IORF00128 sensor histidine kinase 



IQRF00131 fructose-bisphosphate aldolase (fba) 



I ORF00132 L-2-hydroxyisocaproate dehydrogenase 



IORF00133 ribosomal protein L28 (rpmB) 



IQRF00134 conserved hypothetical protein 



1ORF00135 DAK2 domain protein 



1 ORF00136 expressed SPFH domain/Band 7 family protein 



IORFQ0141 amino acid ABC transporter, ATP-binding protein 



1ORF00142 amino acid ABC transporter, amino acid-binding protein/permease protein 
ORF001 43 conserved hypothetical protein 
ORF00145 undecaprenol kinase, putative 
ORF00146 negative regulator of competence MecA, putative 



ORF00149 ABC transporter, ATP-binding protein 



1ORF00150 conserved hypothetical protein 



IORF00151 selenocysteine lyase(csdB) 
1ORF001 52 NifU family protein 



1ORF00153 conserved hypothetical protein 



IORF00155 D-alanyl-D-alanlne carboxy peptidase 



1ORF00158 oligopeptide ABC transporter, permease protein 



IORF00160 oligopeptide ABC transporter, ATP-binding protein 



IORF00161 oligopeptide ABC transporter, ATP-binding protein 



1ORF00167 adc operon repressorAdcR (adcR) 



I ORF00168 2inc ABC transporter, ATP-binding protein 



IORF00169 zinc ABC transporter, permease protein 



{ORF00172tyrosyl-tRNAsynth tase(tyrS) 



I ORF00173 penicillin-binding protein 1B, putative 

|ORF00174 DNA-directed RNA polymerase, beta subunit (rpoB) 



QRFQ0176 DNA-directed RNA polymerase beta' subunit (rpoC) 



I OR FQ 01 78 conserved hypothetical protein 



I ORF0Q1 79 competenc protein CgIA (cgIA) 
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ORFQQ180 competence protein CglB (cgIB) 

ORF00181 conserved hypothetical protein 

' ORFQ0183 conserved hypothetical protein 

ORF00184 acetate kinase (ackA) 

ORF00190 pyroline-5-carboxylate reductase (proC) 

ORF00191 glutamyl-aminopeptidase (pepA) 

ORF00198 single-strand binding protein (ssb) 

ORF0Q211 PTS system, 1IABC components 

ORF00212 alpha amylase family protein 

ORF00214 transcriptional antiterminator, BglG family 

ORF00219 PTS system, IIC component, putative 

ORF00224 ribosomai protein S15 (rpsO) 

ORFQ0225 polyribonucleotide nucleotidyltransferase (pnp) 

ORF00227 serine O-acety lira n sfe rase (cysE) 

ORF00229 cysteinyl-tRNA synthetase (cysS) 

ORF00230 conserved hypothetical protein 

ORF00231 RNA methyltransferase, TrmH family, group 3 

ORF00233 DegV family protein 

ORF00236 ribosomai protein L13 (rplM) 

ORF00237 ribosomai protein S9 (rpsl) 

ORF00261 transcriptional regulator MutR family 

ORF00262 transporter, putative 

ORF00263 amino acid ABC transporter, permease protein 

ORF00264 amino acid ABC transporter, amino acid-binding protein 

ORF00265 amino acid ABC transporter, permease protein 

ORF00266 amino acid ABC transporter, ATP-binding protein 

ORF00295 N-acetylglucosarnine-6-phosphate deacetylase (nagA) 

ORF00296 conserved hypothetical protein 

ORF00297 gjycyjjRNA synthetase, alpha subunit (glyQ) ■ 

ORF00299 glycyl-tRNA synthetase, beta subunit (glyS) 

ORF00300 conserved hypothetical protein 

ORF00302 glycerol kinase (glpK) 

ORF00303 alpha-glycerophosphate oxidase 

ORF00304 glycerol uptake facilitator protein (glpF) 

ORF00306 conserved hypothetical protein 

ORF00307 transketolase (tkt) 

ORF00309 ABC transporter, ATP-binding protein 

ORF0031Q membrane protein, putative 

ORF00313 PTS system, II BC components 

QRF00314 glutamate 5-kinase (proB) 

QRF00315 gamma-glutamyl phosphate reductase (proA) 

ORF00316 conserved hypothetical protein TIGR00006 

ORF00318 penicillin-binding protein 2X (pbpX) 

QRF00319 phospho-N-acetylmuramoyl-pentapeptide-transferase (mraY) 
ORF0Q320 ATP-dependent RNA helicase, DEAD/DEAH box family 

QRF00321 ABC transporter, substrate-binding protein 

ORF00322 amino acid ABC transporter, permease protein 

QRF00323 amino acid ABC transporter, ATP-binding protein 

QRF00325 thioredoxin reductase (trxB) 

QRF00326 conserved hypothetical protein ) 

ORF00327 NAD synthetas (nadE) 

ORF00328 aminopeptidase C (pepC) 

ORF00329 penicillin-binding prot in 1 A (pbpl A) 

ORF0033Q recombination protein U (reel?) 

QRF00331 conserved hypothetical prot in 
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ORF00335 conserved hypothetical protein 

ORF00336 conserved hypothetical protein 

ORF00337 autoinducer-2 production protein LuxS (luxS) 

ORF0Q338 KH domain protein 

ORF00348 guanylate kinase (gmk) 

ORF00349 DNA-directed RNA polymerase, omega subunit, putative 

ORF00350 primosomal protein N 1 (priA) 

ORF00351 methionyl-tRN A fonmyltransferase (fmt) 

ORF00352 Sun protein (sun) 

ORF00353 serine/threonine phosphatase, putative - 

ORF00354 serine/threonine protein kinase 

ORF00355 conserved hypothetical protein 

ORF003S6 sensor histjdine kinase, putative 

ORF00358 PNA-binding response regulator 

ORF00359 hydrolase, haloacid dehalogenase family/peptidyl-prolyl cis-trans isomerase, cyclophilin type 

ORF00360 general stress protein, putative 

QRF00361 pyruvate formate-lyase-activating enzyme (pfIA) 

ORF00362 transcriptionai regulator, DeoR family 

ORF00363 transcriptional regulator, putative 

ORF00364 PTS system, celtobiose-specific HA component (celC) 

ORF0Q366 PTS system, cellobiose-specific IIB component (celA) 

ORF00367 PTS system, celtobiose-specific HC component (celB) 

ORF00368 formate acetyltransferase (pfID) ___ 

ORF00369 transaldolase family protein 

QRF00371 glycerol dehydrogenase (gldA) 

ORF00372 cysteine synthase A (cysK) . , 

ORF00373 conserved hypothetical protein T1GR00257 

ORF00374 helicase, putative 

ORF00375 competence protein F, putative 

ORF00376 ribosomal subunit interface protein (yfiA) 

ORF00385 enoyl-CoA hydratase/isomerase family protein 

ORF0Q386 transcriptional regulator, MarR family 

ORF00387 3-oxoacyKacyl-carrier-protein) synthase 111 (fabH) 

ORF00388 acyl carrier protein (acpP) 

ORF0039Q enoyKacyl-carrier-protein) reductase II (fabK) 

ORFQQ391 malonyl CoA-acyl carrier protein transacylase (fabP) 

ORF00392 3-oxoacyl-[acyl-carrier protein] reductase (fabG) 

ORF00393 3-oxoacyKacyl-carrier-proteln) synthase II (fabF) 

ORF00394 acetyl-CoA carboxylase, biotin carboxyl carrier protein (accB) 

ORF00395 (3R)-hydroxymyristoyl-(acyl-carrier-protein) dehydratase (fabZ) 

ORF00396 acetyl-CoA carboxylase, biotin carboxylase (accC) 

ORF00397 acetyl-CoA carboxylase, carboxyl transferase, beta subunit (accD) 

QRFQQ398 acetyi-CoA carboxylase, carboxyl transferase, alpha subunit (accA) 

ORF00400 seryl-tRNA synthetase (serS) 

ORF00403 conserved hypothetical protein , . 

ORF00404 PTS system, mannose-specific IIP component 

ORF00405 PTS system, mannose-specific HC component (manM) 

ORF00406 PTS system,, mannose-specific IIAB components (manL) 

ORFQQ407 hydrolase, haloacid dehalogenase-like family 

ORF00410 xanthine/uracil permease family protein 

ORF0Q411 conserved hypothetical protein TIGRQ0150, putative 

QRF00412 acetyltransferas . GNAT family 

ORF00413 expressed protein of unknown function 

ORF00415 HIT family protein (hit) 

ORF0Q419 ABC transporter, ATP-binding protein 
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ORF00421 ABC transporter, permease protein 

ORF00422 conserved hypothetical protein __ 

ORF00423 conserved hypothetical protein TIGR00091 

ORF00424 conserved hypothetical protein. POINT MUTATION 

ORF00425 N utilization substance protein A (nusA) 

ORF00426 conserved hypothetical protein 

ORF00427 ribosomal protein L7A family 

ORF00428 translation initiation factor IF-2 

ORF00429 ribosome-binding factor A (rbfA) 

ORF00432 copper-transporter ATPase CopA 

ORF00435 hydrolase, haloacid dehalogenase-like family 

ORF00436 DNA polymerase I (polA) 

ORF00437 CoA binding domain protein 

ORF00440 DNA-binding response regulator 

ORF00441 sensor histidine kinase 

ORF00443 queuine tRNA-ribosyltransferase (tgt) 

ORF00444 conserved hypothetical protein 

ORF00449 glucose-6-phosphate isomerase (pgi) 

ORF00451 rhomboid family protein m 

ORF00452 expressed putative lipoprotein 

ORF00453 UTP-glucose-1 -phosphate uridylyltransferase (galU) 

ORF00454 giycerol-3-phosphate dehydrogenase (NAD(P)+) (gpsA) 

ORF00455 ribonuclease P protein component (rnpA) 

ORF00456 SpolllJ family protein 

ORF00458 R3H domain protein 

ORF00463 conserved hypothetical protein 

ORF00464 RecX protein 

ORFQ0465 RNA methyltransferase, TrmA family a 

ORF0047Q ribonucleoside-diphosphate reductase 2, beta subunit (nrdF) 

. ORF00472 ribonucleoside-diphosphate reductase 2, alpha subunit (nrdE) 

ORF00482 alcohol dehydrogenase, zinc-containing 

ORF00483 oxidoreductase, aldo/keto reductase family 

ORF00484 cation efflux system protein 

ORF00485 transcriptional regulator, TetR family 

ORF00496 conserved hypothetical protein 

ORF00500 acetyltransferase, GNAT family 

ORF00501 conserved hypothetical protein 

ORF00502 valyltRNA synthetase (valS) 

ORF00508 aspartate-ammonia ligase (asnA) ____ 

ORF0051 1 type II DNA modification methyltransferase, putative 

ORF00513 phosphopantetheine adenytyltransferase (coaD) 

ORF00515 conserved hypothetical protein a 

ORF00519 conserved hypothetical protein 

ORF00520 conserved hypothetical protein TIGR00048 

ORF00522 ABC transporter, ATP-binding/permease protein 

ORF00523 ABC transporter, ATP-binding/permease protein 

ORF00524 anthranilate synthase component II (trpG) 

ORF00532 endonuclease 111 (nth) ____ 

ORF00534 conserved hypothetical protein 

ORF00535 giucojdnase (glk) 

ORF00536 expressed prot in with rhodan se domain 

ORF00537 elongation factor Tu family protein 

ORF00540 UDP-N-acetylmuramoylalanine— D-glutamate ligase (murD) 

ORF00541 UDP-N-acetylglucosamine-N-acetylmuramyl-(pentapeptide) pyrophosphoryl-undecaprenol N 
acetylglucosamine transferase (murG) 
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QRF00542 cell division protein DivlB, putative 

ORF00544 cell division protein FtsA (ftsA) 

ORF00545 cell division protein FtsZ (ftsZ) 

ORFQ0546 ylmE protein, putative 

QRF00547 ylmF protein (ylmF) ____ 

ORF00549 ylmH protein (ylmH) 

ORF00550 cell division protein DivlVA, putative , 

ORF00552 isoleucyl-tRNA synthetase (ileS) 

ORF00553 conserved hypothetical protein 

ORF00554 MutT/nudix family protein 

ORF00555 ATP-dependent Clp protease, ATP-binding subunit 

ORF00557 conserved hypothetical protein 

QRF00558 amino acid ABC transporter, permease protein 

ORF00559 amino acid ABC transporter, ATP-binding protein 

ORF00560 phosphoglucomutase/phosphomannomutase family protein 

ORF00562 methylenetetrahydrofolate dehydrogenase/methenyltetrahydrofolate cyclohydrolase (folD) 

ORF00564 exodeoxyribonuclease VII, large subunit (xseA) 

ORF00566 geranyltranstransferase, putative 

ORF00567 hemolysin A 

ORF00570 DNA repair protein RecN (recN) 

ORF00571 expressed DegV family protein 

ORF00574 DNA-binding protein HU (hup) 

ORF00576 dihydroorotate dehydrogenase A (pyrDA) 

ORF00S77 beta-lactam resistance factor (fibB) 

QRF00578 beta-lactam resistance factor (fibA) 

QRF00579 murM protein, putative . 

QRF00580 hydrolase, haloacid dehalogenase-like family 

ORF00581 HP domain protein 

ORF00582 conserved hypothetical protein 

ORF00583 cation-transporting ATPase, E1-E2 family 

ORF00588 cell division ABC transporter, ATP-binding protein FtsE (ftsE) 

ORF00589 cell division ABC transporter, permease protein FtsX (ftsX) 

ORF00591 metallo-beta-lactamase superfamiiy protein 

ORF00593 DNA polymerase III, epsilon subunit/ATP-dependent helicase DinG 

ORF00595 aspartate aminotransferase (aspC) 

ORF00596 asparaginyl-tRNA synthetase (asnS) 

ORF00601 conserved hypothetical protein 

QRF00602 conserved hypothetical protein 

ORF006Q3 conserved hypothetical protein 

ORF00605 zinc ABC transporter, zinc-binding adhesion liprotein 

ORF00606 ribosomal protein L31 (rpmE) 

ORF00607 DHH family protein 

ORF00609 flavodoxin , 

ORFQ0614 ribosomal protein L19 (rplS) 

ORF00640 prophage LambdaSal, single-strand binding protein (ssb) 

ORFQ0693 DNA-binding response regulator VncR (vncR) 

ORF00694 sensor histidine kinase VncS (vncS) 

ORF00699 rod shape-determining protein RodA, putativeD (rodA) 

ORFQ0700 hydrolase, haloacid dehalogenase-like family 

ORF00701 DNA gyrase, B subunit (gyrB) 

ORF00702 septation ring formation regulator EzrA, putative 

ORF0070S conserved hypothetical protein 

ORF00706 enolase (eno) 

ORF00708 3-phosphoshikimate 1-carboxyvinyltransferase (aroA) 

ORF00709 shikimate kinas (aroK) 
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ORF00710 psr protein ____ 

ORF00711 RNA methyltransferase, TrmA family 

ORF00729 sortase family protein 

ORF00731 sortase family protein 

ORF00734 sortase family protein, FRAMESHIFT 

ORF00743 ABC transporter, ATP-binding protein 

ORF00744 membrane protein 

ORF0Q745 conserved hypothetical protein 

ORF00748 cylG protein (cylG) 

ORF00776 DNA-entry nuclease, putative 

ORF00789 2-keto-3-deoxygluconate kinase 

ORF00792 2-dehydro-3-deoxyphosphogluconate aldolase/4-hydroxy-2-oxoglutarate aldolase (eda) 

ORF00798 proline dipeptidase (pepQ) 

ORF00799 transcriptional regulator, RegM family 

ORF00802 glycosyl transferase, group 1 family protein 

ORF00803 threonyl-tRNA synthetase (thrS) 

QRF00804 DNA-binding response regulator 

ORF00808 amino acid ABC transporter, permease protein 

ORF00810 amino acid ABC transporter, ATP-binding protein 

ORF00811 DNA-binding response regulator 

ORF0Q812 sensory box hjstidjne kinase 

ORF00813 metallo-beta-lactamase family protein 

ORFQ0815 ribonuclease III (rnc) 

ORF00816 expressed putative chromosome segregation SMC protein 

ORF00817 hydrolase, haloacid dehalogenase-like family 

ORF00818 hydrolase, haloacid dehalogenase-like family 

ORF00819 signal recognition particle-docking protein FtsY (ftsY) 

. ORF00820 ABC transporter, substrate-binding protein 

ORF00821 ABC transporter, permease protein, putative 

ORF00824 transcriptional accessory protein Tex, putative 

ORF00825 conserved hypothetical protein 

ORF00828 HPr(Ser) kinase/phosphatase (hprK) 

ORF00830 prolipoprotein diacylglyceryl transferase (Igt) 

ORF0Q832 conserved hypothetical protein 

ORF00835 peptidase, U32 family, putative 

ORF00836 peptidase, U32 family 

ORF00837 conserved hypothetical protein 

ORFQQ844 lysyl-tRNA synthetase (lysS) 

ORF00846 phosphoglycerate mutase family protein 

ORF00847 ebsC family protein, putative 

ORF00850 peptidase, U32 family 

ORF00855 oligoendopeptidase F, putative 

ORF00856 phosphoenolpyruvate carboxylase (ppc) 

ORF00859 cell division protein, FtsW/RodA/SpoVE family (ftsW) 

QRF00861 translation elongation factor Tu (tuf) 

ORF00863 triosephosphate isomerase (tpiA) 

ORF00865 phosphoglycerate mutase (gpmA) __ 

QRF00867 recombination protein RecR (recR) 

ORF00868 D-alanine-D-alanine ligase 

ORF00869 UDP-N-acetylmuramoylalanyl-D-glutamyh-2,6-diaminopimelate-E)-alanyl-D-alanyl ligase (murF) 
QRF00870 oxalate:formate antiporter 

QRF00871 membrane protein, putative ^ 

ORF00873 peptide chain releas factor 3 (prfC) 

QRF00876 ABC transporter, ATP-binding protein 

ORF0Q880 ATP-dependent RNA helicase, DEAD/DEAH box family 
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ORFQ0882 conserved hypothetical protein . 

ORF00883 conserved hypothetical protein 

ORF00884 acyltransferase family protein 

ORF00885 competence protein CelA (celA) 

ORF00887 DNA internalization-related competence protein ComEC/Rec2 

ORF00889 sugar-binding transcriptional regulator, Lacl family 

ORF00892 DNA polymerase III, delta subunit, putativeO 

ORF00893 superoxide dismutase, Fe-Mn (sodA) 

ORF00894 transcriptional antiterminator LicT 

ORF00895 PTS system, beta-glucosides-specific IIABC components ~ 

ORF00896 6-phospho-beta-glucosidase (bgIA) 

QRF00899 gly cerate kinase 2 (garK) 

ORF00904 S-adenosylmethionine:tRNA ribosyltransferase-isomerase (queA) 

ORF00906 glucosamine-6-phosphate isomerase (nagB) 

ORF00908 ribosomal small subunit pseudouridine synthase 

ORF00911 competence protein CoiA (coiA) 

ORFQ0912 oUgoendopeptidase B (pepB) 

ORF00914 O-methy transferase family protein 

QRF00916 protease maturation protein, putative 

QRF00919 alanyRRNA synthetase (alaS) 

ORF00925 transcriptional regulator, Cro/CI family 

ORF00928 ribonucleoside-diphosphate reductase 2, beta subunit (nrdF) 

ORF00929 ribonucleoside-diphosphate reductase 2, alpha subunit (nrdE) 

ORF00930 ribonucleoside-diphosphate reductase 2, NrdH-redoxin (nrdH) 

ORF00931 phosphocarrier protein HPr (ptsH) 

QRF00932 phosphoenolpyruvate-protein phosphotransferase (ptsl) 

ORF00933 glyceraldehyde-3-phosphate dehydrogenase, NADP-dependent (gapN) 

ORF00934 polysaccharide deacetylase family protein 

ORF00935 ATP-dependent RNA helicase, DEAD/DEAH box family 

ORF00936 uridine kinase (udk) 

QRF0Q937 conserved hypothetical protein 

ORFQ0938 DNA polymerase 111, gamma and tau subunits (dnaX) 

ORF00940 biotin— acetyl-CoA-carboxylase ligase 

ORF00941 S-adenosylmethionine synthetase (metK) 

ORF00955 UDP-N-acetylglucosamine 1-carboxyvinyltransferase (murA) 

ORFQ0956 acetyltransferase, GNAT family 

QRF00957 CBS domain protein 

ORF0Q958 methionine aminopeptidase, type I (map) 

ORF0Q959 ribonuclease BN, putative 

ORFQ0962 conserved hypothetical protein 

ORF00963 DNA ligase. NAD-dependent (ligA) 

ORF00964 BmrU protein, putative 

ORF00966 pullutanase, putative 

ORF00973 ATP synthase FO, A subunit (atpB) 

ORF00974 ATP synthase FO, B subunit (atpF) 

ORF00975 ATP synthase F1, delta subunit (atpH) 

ORF00976 ATP synthase F1, alpha subunit (atpA) __ 

ORF00977 ATP synthase F1, gamma subunit (atpG) 

ORF00978 ATP synthase F1, beta subunit (atpD) 

QRF00979 ATP synthase F1, epsilon subunit (atpC) 

ORF00981 UDP-N-acetylglucosamin 1-carboxyvinyltransferase (murA) 

ORF00983 DNA-entry nuclease (endA) 

ORF00984 phenylalanyl-tRNA synthetas , alpha subunit (pheS) 

ORF00986 phenylalanyltRNA synthetas , beta subunit (pheT) 

ORF00988 exonuclease RexB (rexB) 
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ORF00989 exonucleas RexA (rexA) 

• ORF00991 tRNA modification GTPase TrmE (trmE) 

ORF00992 ABC transporter, ATP-binding protein 

ORF00993 acetoin dehydrogenase, thymine PPi dependent, E1 component, alpha subunit 

ORFQ0994 acetoin dehydrogenase, thymine PPi dependent, E1 component, beta subunit 

ORF00995 acetoin dehydrogenase, thymine PPi dependent E2 component, dihydrolipoamide 

ORF00996 acetoin dehydrogenase, thymine PPi dependent, E3 component, dihydrolipoamide dehydrogenase 

ORF00997 lipoate-proteln ligase A (IplA) * 

ORF00998 cobyric acid synthase, putative 

QRF00999 mur ligase family protein 

ORF01000 conserved hypothetical protein TIGR00159 

ORF01001 expressed protein of unknown function 

ORF01002 phosphoglucomutase/phosphomannomutase family protein 

ORF01005 oxygen-independent coproporphyrinogen 111 oxidase, putative 

ORF01006 conserved hypothetical protein 

ORF01007 hydrolase, haloacid dehalogenase-like family 

ORF01008 conserved hypothetical protein 

ORF01023 GTP-binding protein LepA(lepA) - 

ORF01027 PilB-related protein 

ORF01Q30 cation-transporting ATPase, E1-E2 family 

ORF01033 conserved hypothetical protein 

ORF01040 Tn916, tetracycline resistance protein (tetM) 

ORF01057 transcriptional regulator, GntR family 

ORF01058 DNA polymerase III, alpha subunit (dnaE) 

ORF010S9 6-phosphofructokinase (pfk) 

ORF01060 pyruvate kinase (pyk) ■ ■ 

QRF01063 glucosamine— fructose-6-phosphate aminotransferase (isomerizing) (glmS) 

ORF01066 phnA protein (phnA) 

QRF01068 amino acid ABC transporter, permease protein 

ORF01069 amino acid ABC transporter, ATP-binding protein 

QRF01070 amino acid ABC transporter, amino acid-binding protein 

ORF01072 ribosomal protein S20 (rpsT) 

ORF01073 pantothenate kinase (coaA) 

QRF01074 conserved hypothetical protein 

ORF01Q75 cytidine deaminase (cdd) 

ORF01Q76 expressed putative lipoprotein 

ORF01077 sugar ABC transporter, ATP-binding protein 

ORF01078 sugar ABC transporter, permease protein, putative 

ORF01079 sugar ABC transporter, permease protein, putative 

ORF01080 NADH oxidase (nox-2) 

ORF01081 L-lactate dehydrogenase (Idh) 

ORF01082 DNA gyrase, A subunit (gyrA) 

ORF01083 sortase SrtA (srtA) 

QRF01089 GMP synthase (guaA) 

ORF01090 transcriptional regulator, GntR family 

ORF01091 gid protein (gid) ;_ 

ORF01093 expressed putative lipoprotein 

ORF01097 ABC transporter, ATP-binding protein 

ORF01099 DNA-binding response regulator 

ORF01101 site-specific recombinase, phage integrase family 

ORF01106 signal recognition particle protein Ffh (ffh) 

ORF01108 conserved hypothetical protein 

ORF01109 sensor histidine kinase CiaH 

ORF01 110 DNA-binding response regulator CiaR (ciaR) 

ORF01 1 1 1 aminopeptidas N (pepN) 
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ORF01112 phosphate transport system regulatory protein Phot! (phoU) 
ORF01113 phosphate ABC transporter, ATP-binding protein PstB, putative 
ORF01114 phosphate ABC transporter, ATP-binding protein PstB, putative 
ORF01115 phosphate ABC transporter, permease protein PstA t putative 

QRF01116 phosphate ABC transporter, permease protein 

ORF01117 phosphate ABC transporter, phosphate-binding protein 

ORF01118 NOL1/NOP2/sun family protein 

ORF01119 inositol monophosphatase family protein 

QRF01120 conserved hypothetical protein 

ORF01121 conserved hypothetical protein 

ORF01122 macrolide-efflux protein mreA/riboflavin biosynthesis protein RibF 

ORF01123 tRNA pseudouridine synthase B (truB) 

ORF01125 conserved hypothetical protein 

ORF01128 permease, putative 

ORF01129 ABC transporter, ATP-binding protein 

ORF01131 DNAtopoisomerase I (topA) 

QRF01132 PprA/SMF protein, putative DNA processing factor (dprA) 

ORF01134 iron compound ABC transporter, ATP-binding protein 

ORF01137 acetyltransferase, CysE/LacA/LpxA/NodL family 

ORF0113B ribonuclease Hll (rnhB) 

ORF01139 GTP-binding protein 

ORF01176 carbamoyl-phosphate synthase, large subunit (carB) 

ORF01177 carbamoyl-phosphate synthase, small subunit (carA) 

ORF01178 aspartate carbamoyltransferase (pyrB) 

ORF01179 dihydroorotase, multifunctional complex type (pyrC) 

ORF0118Q orotate phosphoribosyltransferase (pyrE) 

ORF01181 orotidine 5 -phosphate decarboxylase (pyrF) 

QRF01183 ABC transporter, ATP-binding protein 

ORF01184 ribonucleotide reductase, truncation 

ORF01188 cardiolipin synthetase (els) 

ORF01189 formate-tettahydrofolate ligase (fhs) 

ORF01190 lipoate-protein ligase A (IplA) 

ORF01198 flavoprotein-related protein 

ORF01 199 flavoprotein family protein 

ORF01200 membrane protein, putative 

ORF01201 phosphoglucomutase (pgm) 

QRF01203 IS861, transposase OrfB 

ORF01205 ABC transporter, ATP-binding/permease protein 

ORF01206 ABC transporter, ATP-binding/permease protein 

ORF01207 conserved hypothetical protein 

ORF01208 conserved hypothetical protein 

ORF01209 Serine hydroxymethyltransferase 

ORF01210 Sua5/YtiO/YrdC/YwlC family protein 

ORF01211 modification methylase, HemK family 

ORF01212 peptide chain release factor 1 (prfA) 

ORF01213 thymidine kinases (tdk) 

ORF01214 4-oxalocrotonate tautomerase (xylM) 

ORF01216 ApbE family protein 

ORF01220 xanthine permease (pbuX) 

ORF01221 xanthine phosphoribosyltransferase (xpt) 

ORF01222 guanosine monophosphate reductase (guaC) 

QRF01227 phosphate acetyltransferase 

ORF01228 ribosomal large subunit pseudouridine synthase, RluD subfamily 

ORF01229 express d protein of unknown function 

QRFQ1230 GTP pyrophosphokinase family protein 
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ORF01231 conserved hypothetical protein 

ORF01232 ribose-phosphate pyrophosphokinase (prsA) 

ORF01233 cysteine desulphurase (iscS) 

ORF01234 conserved hypothetical protein 

ORF01235 conserved hypothetical protein 

ORF01236 DNA repair protein RadC (radC) 

ORF01238 6-phospho-beta-glucosidase (ascB) 

ORF01239 platelet activating factor, putative 

ORF01240 hydrolase, haioacid dehalogenase-like family 

ORF01242 voltage-gated chloride channel family protein 

ORF01243 spermidine/putrescine ABC transporter, spermidine/putrescine-binding protein (potD) 

ORF01244 spermidine/putrescine ABC transporter, permease protein (potC) 

ORF01245 spermidine/putrescine ABC transporter, permease protein (potB) 

ORF01246 spermidine/putrescine ABC transporter, ATP-binding protein (potA) 

ORFQ1247 UDP-N-acetylenolpyruvoylglucosamine reductase (murB) 

ORF01248 2»amino-4-hydroxy-6-hydroxymethyldihydropteridine pyrophosphokinase (folK) 

ORF01250 dihydropteroate synthase (folP) 

QRF01251 GTP cyctohydrolase I (folE) 

QRF01252 folylpolyglutamate synthase (folC) 

ORF01259 aldehyde dehydrogenase family protein 

ORF01260 membrane protein 

ORF01274 gls24 protein, putative 

ORF01276 gls24 protein, putative 

ORF01279 conserved hypothetical protein 

QRF01282 ATP-dependent DNA helicase PcrA (pcrA) 

ORF01283 conserved hypothetical protein, FRAMESHIFT 

ORF01284 uracil permease (uraA) 

ORF01285 sodiumralanine symporter family protein 

ORF01286 cation efflux family protein 

ORF01290 ribosomal protein S1 (rpsA) 

ORF01292 branched-chain amino acid aminotransferase (ilvE) 

ORF01294 DNA topoisomerase IV, A subunit (parC) 

ORF01295 DNAtopoisomerase IV, B subunit (parE) 

ORF01296 membrane protein, putative 

ORF01297 uracil-DNA glycosylase (ung) 

ORF01317 transcriptional regulator, LysR family, putative 

ORF01319 purine nucleoside phosphorylase (deoD) 

ORF01321 purine nucleoside phosphorylase (deoD) 

ORF01323 phosphopentomutase (deoB) 

ORF01324 ribose 5-phosphate isomerase (rpiA) 

ORF01327 tributyrin esterase (estA) 

ORF01328 metallo-beta-lactamase superfamily protein 

ORF01329 ABC transporter, ATP-binding protein 

ORF0133Q ABC transporter, permease protein 

ORF01331 conserved hypothetical protein ; 

ORF01332 adherence and virulence protein A (pavA) 

ORF01335 TPR domain protein 

ORF01336 membrane protein 

QRF01338 mutator MutT protein (mutX) 

ORF01339 hyaluronidase 

ORFQ1343 iminodiacetate oxidas , putative 

QRF01344 conserved hypothetical protein TIGR00486 

ORF01345 conserved hypothetical protein 

ORFQ1346 DNA replication protein Dnad, putativ 

ORF01347 adenine phosphoribosyltransferase (apt) 
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ORF01350 single-stranded-DNA-specific xonuclease RecJ (recJ) _ 

ORF01351 oxidoreductase, short chain dehydrogenase/reductase family 
QRF01352 metallo-beta-lactamase superfamily protein 

ORF01353 conserved hypothetical protein 

ORF01354 GTP-binding protein HfIX (hfIX) 

ORF01355 tRNA delta(2)-tsopentenylpyrophosphate transferase (miaA) 

ORF01357 exfoliative toxin A t putative 

ORF01358 pullulanase, putative 

ORF01362 conserved hypothetical protein : 

ORF01363 peptidase, M20/M25/M40 family 

ORF01364 nitroreductase family protein . 

ORF01367 excinuclease ABC t C subunit (uvrC) 

QRF01380 streptococcal histidine triad family protein 

ORF01381 laminin-binding surface protein (Imb) 

ORF01397Tn5252 t relaxase 

. ORF01403 mercuric reductase (merA) 

ORF01406 IS861, transposase OrfB, truncation 

ORF01407 cation-transporting ATPase, E1-E2 family 

ORF0141 1 conserved hypothetical protein 

ORF01412 cation-transporting ATPase, E1-E2 family 

ORF01415 transcriptional repressor CopY, putative 

ORF01416 cadmium resistance transporter, putative 

QRF01451 C-5 cytosine-specific DNA methylase 

ORF01453 conserved hypothetical protein 

ORF01455 ribosomal protein L7/L12 (rpIL) 

ORF01456 ribosomal protein L1Q (rplJ) 

ORF01458 ATP-dependent Clp protease, ATP-binding subunit 

ORF01467 GTP-binding protein (cgpA) 

ORF01468 ATP-dependent Clp protease, ATP-binding subunit CIpX (dpX) 

ORFQ1470 dihydrofolate reductase (fbIA) 

ORF01471 thymidylate synthase (thyA) 

ORF01472 HMG-CoA synthase 

. ORF01473 3-hydroxy-3-methylglutaryl-CoA reductase 

ORF01474 conserved hypothetical protein 

ORF01475 hemolysin III, putative 

ORF01476 conserved hypothetical protein TIGR00147 

ORF01479 isopentenyl-diphosphate delta-isomerase 

ORF01480 phosphomevalonate kinase 

ORF01481 diphosphomevaionate decarboxylase (mvaD) 

QRF01482 mevalonate kinase, putative 

ORF01484 DNA-binding response regulator 

ORFQ1491 polypeptide deformylase, putative 

ORF01495 ABC transporter, ATP-binding/permease protein 

ORF01496 ABC transporter, ATP-binding/permease protein 

ORF01498 ABC transporter, ATP-binding protein 

QRF01499 polyA polymerase family protein 

ORF01500 PegV family protein 

ORF01501 expressed protein of unknown function 

ORF01504 PTS system, fructose specific 1 1 ABC components 

ORF01505 1-phosphofructokinase (fruK) 

ORF015Q6 lactose phosphotransferase system repressor (lacR) 

ORF01507 beta-lactam resistance factor 

ORF01511 pyridine nucleotide-disulphide oxidoreductase family protein 

ORF01512 tRNA (guanine-N1)-methy [transferase (trmD) 

ORFQ1513 16S rRNA processing protein RimM (rimM) 
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ORF01 51 5 transcriptional regulator, Rot A family 

ORF01516 KH domain protein 

ORF01 51 7 ribosomal protein S1 6 (rpsP) 

ORF01518 permease, putative 

ORF01519 ABC transporter, ATP-binding protein 

ORF01520 conserved hypothetical protein 

ORF01523 carbamoyl-phosphate synthase, small subunit (carA) 

ORF01524 pyrimidine operon regulatory protein (pyrR) 

ORF01525 ribosomal large subunit pseudouridine synthase, RluD subfamily 

ORF01526 lipoprotein signal peptidase (IspA) 

PRF01527 transcriptional regulator, LysR family 

ORF01528 ribosomal protein L27 (rpmA) 

ORF01529 conserved hypothetical protein 

ORF01530 ribosomal protein L21 (rplU) 

ORF01531 conserved hypothetical protein, FRAMESHIFT 

ORF01532 thiamine biosynthesis protein Thil (thil) 

QRF01533 cysteine desulphurase (iscS) 

ORFQ1 536 glutathione reductase (gor) 
ORF01537 conserved hypothetical protein 

ORF01538 chorismate synthase (aroC) 

ORF01539 3-dehydroquinate synthase (aroB) ■ 

ORF01540 3-dehydroquinate dehydratase (aroD) 

ORF01541 conserved hypothetical protein 

ORFQ1543 ribosomal protein L20 (rpIT) 

ORF01544 ribosomal protein L35 (rpml) 

ORF01545 translation initiation factor IF-3 (infC) 

QRF01546 cytidylate kinase (cmk) 

ORF01548 ferredoxin, 4Fe-4S 

ORF01 550 peptidase t (pepT) 

ORF01551 polysaccharide biosynthesis protein, putative 

ORF01552 UDP-N-acetylmuramoylalanyl-D-glutamate~2,6-diaminopimelate Hgase (murE) 

ORF01553 iron compound ABC transporter, ATP-binding protein (fepC) 

QRF01555 iron compound ABC transporter, permease protein 

ORF01556 iron compound ABC transporter, permease protein 

ORF01558 inorganic pyrophosphatase, manganese-dependent (ppa) 

ORF01559 pyruvate formate-lyase-activating enzyme (pfIA) 

ORF0156Q CBS domain protein 

ORF01561 conserved hypothetical protein 

ORF01564 PAP2 family protein 

ORF01565 membrane protein, putative 

ORF01567 expressed sortase family protein 

QRF01568 sortase family protein 

ORF01571 rogB protein FRAMESHIFT (rogB) 

ORF01587 conserved hypothetical protein 

ORF01589 RNA polymerase sigma-70 factor (rpoD) 

ORF01590 DNA primase (dnaG) 

ORF01591 large conductance mechanosensitive channel protein (mscL) 

ORF01592 ribosomal protein S21 (rpsU) 

ORF01594 amino acid ABC transporter, amino acid-binding protein 

ORFQ1598 rhodanese family protein 

ORF01602 glycogen phosphorylase (glgP) 

ORF016Q3 4-alpha-glucanotransferase (malQ) 

ORF01604 maltose operon repressor MaIR, putative 

ORF01605 maltose/maltodextrin ABC transporter, maltose/maltodextrin-binding protein 
ORF01606 maltose ABC transporter, permease protein 
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ORFQ1607 maltose ABC transporter, permease protein 

ORF01614 preprotein translocase SecA subunit, putative 

ORF01619 preprotein translocase SecY family protein 

ORF01634 excinuclease ABC, B subunit (uvrB) 

ORF01636 glutamine ABC transporter, glutamine-binding protein/permease protein (glnP) 

ORF01637 glutamine ABC transporter, ATP-binding protein, GlnQ putative 

ORF01640 GTP-binding protein, GTP1/Obg family (obg) 

ORF01646 amidase family protein 

ORF01647 ribosomal small subunit pseudouridine synthase A (rsuA) 

QRF01648 oxidoreductase, aldo/keto reductase family 

ORF01651 lactoylglutathione lyase (gloA) 

ORF01652 glycosyl transferase, group 2 family protein 

QRF01654 SsrA-binding protein (smpB) 

ORF01655 exoribonuclease, VacB/Rnb family (vacB) 

ORF01657 preprotein translocase, SecG subunit 

ORF01658 multi-drug resistance protein 

ORF01662 dephospho-CoA kinase 

ORF01663 fbrroamidopyrimldine-DNA glycosylase (mutM) 

ORF01677 GTP-binding protein Era (era) 

ORF01678 diacylglycerol kinase (dgkA) 

ORF01679 conserved hypothetical protein T1GR00043 

. ORF01685 PhoH family protein 

ORF01687 conserved hypothetical protein 

ORFQ1689 conserved hypothetical protein 

ORF01690 ribosome recycling factor (frr) 

ORF01691 uridylate kinase (pyrH) 

ORF01693 peptide ABC transporter, ATP-binding protein FRAMESHIFT 

ORF01697 ribosomal protein L1 (rplA) 

ORF01698 ribosomal protein L11 (rplK) 

ORF01706 IS861, transposase OrfB 

ORF01707 chorismate binding enzyme 

ORFQ1708 FtsK/SpolllE family protein . 

ORF01709 peptidyl-prolyl cis-trans isomerase, cyclophilin-type 

ORF01710 manganese ABC transporter, permease protein 

ORF01711 manganese ABC transporter, ATP-binding protein 

ORF01712 manganese ABC transporter, manganese-binding adhesion liproteln 

ORF01713 iron-dependent transcriptional regulator 

ORF01714 5-methylthioadenosine nucleosidase/S-adenosylhomocysteine nucleosidase (pfs) 

ORF01716 MutT/nudix family protein 

ORF01718 UDP-N-acetylglucosamine pyrophosphorylase (glmU) 

QRF01722 oxidoreductase, Gfo/ldh/MocA family 

ORF01725 gluconate 5-dehydrogenase, putative 

ORFQ1726 conserved hypothetical protein 

ORF01738 branched-chain amino acid transport system II carrier protein (brnQ) 

ORF01739 methionyl-tRNA synthetase (metG) 

ORF01745 exodeoxyribonuclease (exoA) 

QRF01746 conserved hypothetical protein 

ORF01752 copper homeostasis protein CutC, putative 

ORF01755 tetrapyrrole methylase family protein 

QRF017S6 conserved hypothetical protein 

ORF01758 PNA polymerase III, delta prime subunit, putative 

ORF01759 thymldylate kinase (tmk) 

ORF01773 ATP<lependent Clp protease, proteolytic subunit ClpP (dpP) 

ORF01774 uracil phosphoribosyltransferase (upp) 

ORF01777 RNA methyltransferase, TrmH family, group 2 
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ORF01781 conserved hypothetical protein TIGR00278 

ORF01782 ribosomal large subunit pseudouridlne synthase B (rluB) 

ORF01783 conserved hypothetical protein TIGR00281 

ORF01784 conserved hypothetical protein 

ORF01785 integrase/recombinase, phage integrase family 

ORF01786 CBS domain protein 

ORF01787 conserved hypothetical protein 

ORF01788 HAM1 protein 

ORF01789 glutamate racemase (murl) 

ORF01791 membrane protein, putative 

ORF01792 transcriptional regulator, biotin repressor family 

ORF01793 membrane protein, putative 

ORF01795 RNA methyltransferase, TrmH family 

ORF01796 acylphosphatase 

ORF01797 lipoprotein, putative 

ORF01799 amino acid ABC transporter, permease protein 

ORF01801 amtdase family protein 

ORF01802 transcription elongation factor GreA (greA) 

ORF01803 conserved hypothetical protein 

ORF018Q4 acetyltransferase, GNAT family 

ORF01805 UDP-N-acetylmuramate-alanine ligase (murC) 

ORF01806 conserved hypothetical protein 

ORF01808 expressed putative helicase ; 

" QRF01811 phosphoglycerate dehydrogenase-related protein 

ORF01812 primosomal protein Dnal (dnal) 

ORF01813 conserved hypothetical protein 

ORF01814 conserved hypothetical protein TIGR00244 

ORF01815 sensor histidine kinase CsrS (csrS) 

• ORF01816 DNA-binding response regulator CsrR (csrR) 

ORF01817 conserved hypothetical protein 

ORF01818 heat shock protein HtpX (htpX) 

ORF01820 lemA protein (lemA) 

QRF01821 glucose-inhibited division protein B (gidB) 

QRF01822 sodium transport family protein 

ORFQ1823 potassium uptake protein, Trk family, putative 

ORF01825 ABC transporter, ATP-binding protein 

ORF01828 branched-chain amino acid transport system II carrier protein (brnQ) 

ORF01829 alcohol dehydrogenase, zinc-containing (adh) 

ORFQ1830 ABC transporter, permease protein 

ORF01831 ABC transporter, ATP-binding protein 

ORF01833 expressed YaeC family protein 

ORFQ1834 ABC transporter, substrate-binding protein 

ORF01835 glutamine amidotransferase, class I 

QRF01837 conserved hypothetical protein TIGR01033 

ORF01846 glycerol uptake facilitator protein (glpF) 

ORF01849 conserved hypothetical protein 

ORF01851 conserved hypothetical protein 

ORF01852 iojap-related protein 

ORF01854 conserved hypothetical protein TIGR00488 

ORF01855 conserved hypothetical protein TIGR00482 

ORF01856 conserved hypothetical protein TIGR00253 

ORFQ1857 GTP-binding protein 

ORF01858 hydrolase, haloacid dehalogenase-like family 

ORF01860 glutamyltRNA(Gln) amidotransferase, B subunit (gatB) 

ORF01861 glutamyl-tRNA(Gln) amidotransferase, A subunit (gatA) __ 
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QRFQ1862 glutamyl-tRNA(Gln) amidotransferase, C subunit (gatC) 

ORF01867 isochorismatase family protein 

ORF01869 transcriptional regulator CodY, putative 

ORF01870 aminotransferase, class I 

QRF01871 universal stress protein family FRAMESHIFT 

ORF01872 hydrolase, haloacid dehalogenase-like family 

ORF01873 asparaginase family protein 

ORFQ1874 shikimate 5-dehydrogenase (aroE) 

ORF01876 ATP-dependent DNA helicase RecG (recG) 

ORF01878 alanine racemase (air) 

ORF01879 holo-(acyl-carrier-protein) synthase (acpS) 

ORFQ1881 preprotein translocase, SecA subunit (secA) 

ORF01882 mannose-6-phosphate isomerase, class I (manA) 

ORF01883 fructokinase (scrK) 

ORF01885 PTS system II ABC components 

ORFQ1886 sucrose-6-phosphate hydrolase (scrB) 

ORF01887 sucrose operon repressor ScrR (scrR) 

ORF01888 N utilization substance protein B (nusB) 

ORF01889 conserved hypothetical protein 

ORF01890 translation elongation factor P (efp) 

ORF01900 cytidine/deoxycytidylate deaminase family protein 

ORF01906 excinuclease ABC, A subunit (uvrA) 

ORF01907 conserved hypothetical protein 

ORF01908 magnesium transporter, CorA family (corA) 

ORF01909 ribosomal protein S18 (rpsR) 

ORF01910 single-strand binding protein (ssb) 

ORF01911 ribosomal protein S6 (rpsF) 

ORF01912 A/G-specific adenine glycosylase (mutY) 

ORFQ1914 thioredoxin (trx) 

ORF01915 PAP2 family protein 

ORF01916 MutS2 family protein 

ORF01917 conserved hypothetical protein 

ORF01918 conserved hypothetical protein 

ORF01919 ribonuclease Hill (mhC) 

ORF0192Q signal peptidase I 

ORF01921 helicase, putative 

ORF01923 PNA-damage inducible protein P (dinP) 

ORF01924 formate acetyltransferase (pflD) 

ORF01926 conserved hypothetical protein 

ORFQ1927 proteinase, putative, degenerate, FRAMESHIFT 

ORF01929 glycerol uptake facilitator protein, putative 

ORF01930 universal stress protein family 

ORF01933 X-pro dipeptidyl-peptidase (pepX) 

ORF01937 ABC transporter, ATP-binding protein CydC (cydC) 

ORF01938 ABC transporter, ATP-binding protein CydD 

ORF01945 conserved hypothetical protein T1GR00103 

ORF01948 exonuclease 

ORF01949 conserved hypothetical protein 

ORF01950 conserved hypothetical protein TIGR00275 

ORF01952 ribosomal protein S14 (rpsN) 

ORF01957 O-sialoglycoprotein endopeptidase family protein 
ORF01958 ribosomal-protein-alanine acetyltransferase, putative 

ORF01960 expressed protein of unknown function 

ORF01961 conserved hypothetical protein 

ORF01962 metallo-beta-lactamase superfamily protein 
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ORF01963 conserved hypothetical protein 

ORF01964 glutamine synthetase, type I (glnA) 

QRF01965 transcriptional regulator GlnR (ginR) 

ORF01967 conserved hypothetical protein ' 

ORF01969 phosphogtycerate kinase (pgk) 

ORF01971 glyceraldehyde 3-phosphate dehydrogenase (gap) 

ORF01972 translation elongation factor G (fusA) 

ORF01973 ribosomal protein S7 (rpsG) 

ORF01974 ribosomal protein S12 (rpsL) 

ORF01975 pur operon repressor (purR) 

ORF01976 HP domain protein 

ORF01977 conserved hypothetical protein 

ORF01978 conserved hypothetical protein 

ORF01979 ribulose-phosphate 3-epimerase (rpe) 

ORF01980 conserved hypothetical protein TIGR00157 

ORF01 983 dimethyladenosine transferase (ksgA) - 

ORF01985 primase-related protein 

ORF01987 deoxyribonuclease, TatD family 

ORF01992 dltP protein (dltP) 

QRFQ1993 P-alanyl carrier protein (ditC) 

ORF01994 dltB protein (dltB) 

ORF01996 P-alanine-activating enzyme (dltA) 

QRF01997 sensor histidine kinase 

ORF01998 PNA-binding response regulator 

ORF01999 ribosomal protein L34 (rpmH) 

ORF02004 amino acid ABC transporter, ATP-binding protein 

ORF02007 conserved hypothetical protein 

ORF02008 transcriptional antiterminator, BgIG family 

ORF02017 sugar binding transcriptional regulator, Lad family 

ORF02018 transaldolase family protein 

ORF02019 carbohydrate isomerase, AraP/FucA family 

ORF02020 hexulose-6-phosphate isomerase, putative 

ORF02021 hexu)ose-6-phosphate synthase, putative 

ORF02022 PTS system, HA component 

QRF02023 PTS system, HB component 

ORF02024 transport protein SgaT, putative 

ORF02027 adenylosuccinate synthetase (purA) 

ORF02033 chaperonin, 33 kPa (hsIO) 

QRF02034 NifR3/Smm1 family protein 

ORF02037 ATP-dependent Clp protease, ATP-binding subunit 

ORF02038 transcriptional regulator CtsR (ctsR) 

ORF02040 translation elongation factor Ts (tsf) 

ORF02041 ribosomal protein S2 (rpsB) 

ORF02043 alkyl hydroperoxide reductase, subunit F (ahpF) 

QRF02076 prophage LambdaSa2, single-strand binding protein (ssb) 

QRF02082 prophage LambdaSa2, type II PNA modification methyltransferase, putative 

ORF02Q86 prophage LambdaSa2, replicative PNA helicase (dnaC) 

ORF02104 endopeptidase O (pepQ) 

ORF02110 polypeptide deformylase (def) 

ORF02111 sugar binding transcriptional regulator RegR (regR) 

ORFQ2112 conserved hypothetical protein 

ORF02113 PTS system, HP compon nt 

ORFQ2114 PTS system, IIC component 

ORF02115 PTS system, HB component 

ORF02116 glucuronyl hydrolase 
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ORF021 18 PTS system, HA component 

ORF02120 oxidoreductase, short-chain dehydrogenase/reductase family 

ORF02121 conserved hypothetical protein 

ORF02122 carbohydrate kinase, PfkB family 

ORF02123 2-dehydro-3-deoxyphosphogluconate aldolase/4-hydroxy-2-oxoglutarate aldolase (eda) 

ORF02127 DNA polymerase III, alpha subunit, Gram-positive type 

ORF02129 prolyl-tRNA synthetase (proS) 

ORF02130 membrane-associated zinc metalloprotease, putative 

ORF02131 phosphatidate cytidylyltransferase (cdsA) 

ORF02132 undecaprenyl diphosphate synthase (uppS) 

ORFQ2133 preprotein translocase, YajC subunit (yajC) 

QRF02140 glucan 1,6-alpha-glucosldase (dexB) 

ORF02141 sugar ABC transporter, ATP-binding protein (msmK) 

ORF02142 helix-turn-helix domain protein, fis-type 

ORF02144 tagatose 1,6-diphosphate aldolase (lacD) 

ORF02145 tagatose-6-phosphate kinase (lacC) 

ORF02146 galactose-6-phosphate isomerase, LacB subunit (lacB) 

ORF02147 galactose-6-phosphate isomerase, LacA subunit (lacA) 

ORF02149 PTS system, IIC component, putative 

ORFQ2150 PTS system, II B component, putative 

ORFQ2152 PTS system, HA component, putative 

QRFQ2153 lactose phosphotransferase system repressor (lacR) 

ORF02157 adhesion lipoprotein 

ORF02158 expressed protein of unknown function TIGR00256 

QRF02159 GTP pyrophosphokinase (reiA) 

ORFQ2161 nrdl protein (nrdl) 

ORFQ2164 iron ABC transporter, iron-binding protein 

ORF02165 DNA-binding response regulator 

ORF02167 PTS system, IIP component 

QRF02168 PTS system, IIC component 

ORF02174 ABC transporter, ATP-binding protein 

QRF02176 response regulator 

ORF02177 conserved hypothetical protein 

ORF02178 PTS system, IIABC components 

ORFQ2179 sensor histidine kinase 

ORF02180 phosphate regulon response regulator PhoB (phoB) 

ORF02182 phosphate ABC transporter, ATP-binding protein (pstB) 

ORF02183 phosphate ABC transporter, permease protein 

ORF02184 phosphate ABC transporter, permease protein 

QRF02188 conserved hypothetical protein T1GR00046 

ORF02189 ribosomal protein L11 methyltransferase (prmA) 

ORF02197 conserved hypothetical protein 

ORF02199 ATPase, AAA family 

ORF02249 mercuric reductase (merA) 

ORF02272 DNA topology modulation protein FlaR, putative 

ORF02273 glycerol dehydrogenase, putative 

ORF02281 DNA-binding response regulator 

ORF02285 leucyl-tRNA synthetase (leuS) 

ORF02290 transcription antitermination protein NusG (nusG) 

ORF02293 penicillin-binding protein 2A (pbp2A) 

ORF02294 ribosomal large subunit pseudouridine synthase, RluD subfamily 

ORF02296 phosphopentomutase (deoB) 

ORF02297 deoxyribose-phosphate aldolase (deoC) 

ORF02300 uridine phosphorylase (udp) ; 

ORF02302 60 kda chaperonin (groEL) 
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ORF02303 chaperonin, 10 kDa (groES) [ 

ORF02305 ABC transporter, ATP-binding protein 

ORF02306 ABC transporter, permease protein 

ORF02307 expressed putative lipoprotein 

ORF02309 glyoxalase family protein 

ORF02310 conserved hypothetical protein 

ORF02311 anaerobic ribonucleoside-triphosphate reductase activating protein (nrdG) 

ORF02312 acetyltransferase, GNAT family 

ORF02315 anaerobic ribonucleoside-triphosphate reductase (nrdD) 

QRF02318 conserved hypothetical protein 

ORF02320 conserved hypothetical protein 

ORF02321 conserved hypothetical protein 

ORF02322 recA protein (recA) 

ORF02325 DNA-3-methyladenine glycosylase I (tag) 

ORF02327 Holliday junction DNA helicase RuvA (ruvA) 

ORF02329 DNA mismatch repair protein HexB (hexB) 

ORF02333 arginine repressor ArgR, putative 

ORF02334 arginyl-tRNA synthetase (argS) 

ORF02337 conserved hypothetical protein 

ORF02338 conserved hypothetical protein 

ORF02339 aspartyl-tRNA synthetase (aspS) 

ORF0234Q histidyl-tRNA synthetase (hisS) 

ORFQ2342 ribosomal protein L33 (rpmG) 

QRF02357 DNA-binding response regulator 

ORF02359 membrane protein, putative 

ORF02360 carbamate kinase (arcC) 

ORF02361 ornithine carbamoyltransferase (argF) 

ORF02364 amino acid ABC transporter, ATP-binding protein 

ORF02365 amino acid ABC transporter, permease and amino acid-binding protein 

ORF02370 membrane protein, putative ^ 

QRF02371 transcriptional regulator, TetR family, putative 

ORF02373 ribosomal protein S4 (rpsD) 

ORF02374 conserved hypothetical protein 

ORF02375 replicative DNA helicase (dnaC) 

ORF02376 ribosomal protein L9 (rpll) 

ORF02377 DHH family protein 

QRF02378 glucose inhibited division protein A (gidA) 

ORF02380 tRNA (5-methylaminomethyl-2-thiouridyiate)-methyltransferase (trmU) 

ORF02381 L-serine dehydratase, iron-sulfur-dependent, beta subunit (sdhB) 

QRF02382 L-serine dehydratase, iron-sulfur-dependent, alpha subunit (sdhA) 

ORF02385 cobalt transport family protein 

ORF02386 ABC transporter, ATP-binding protein 

ORF02387 ABC transporter, ATR-binding protein, FRAMESHIFT 

ORF02388 CDP-diacylglycerol~glycerol-3-phosphate 3-phosphatidyltransferase (pgsA) 

ORF02389 peptidase, M16 family 

ORF02390 conserved hypothetical protein 

ORF02391 conserved hypothetical protein 

ORFQ2392 recF protein (recF) 

ORF02396 inosine-5'-monophosphate dehydrogenase (guaB) 

ORF02397 transcriptional regulator, ArgR family 

ORF02400 arginine deiminase (arcA) 

ORF02402 ornithine carbamoyltransferase (argF) 

ORF02404 carbamate kinas (arcC) 

ORF02405 tryptophanyl-tRNA synthetase (trpS) 

ORFQ2407 conserved hypothetical protein 
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Table 8 ^ 



ORFxxxxx Annotation 

QRF02408 ABC transporter, ATP-binding protein 

ORF02409 ABC transporter, permease protein, putative 

ORF02410 conserved hypothetical protein TIGR00246 

ORF02411 serine protease 

ORF02412 partitioning protein, ParB family 

ORF02413 chromosomal replication initiator protein DnaA(dnaA) 

ORF02415 DNA polymerase 111, beta subunit (dnaN) 

ORF02417 conserved hypothetical protein 

QRF02419 conserved hypothetical GTP-binding protein 

ORF02420 peptidyl-tRNA hydrolase (pth) 

ORF02421 transcription-repair coupling factor (mfd) 

ORF02423 S4 domain protein 

ORF02424 cell division protein DivIC, putative 

ORF02426 expressed protein of unknown function 

QRF02427 MesJ/Ycf62 family protein 

ORF02429 cell division protein FtsH (ftsH) 
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Table 9: GBS genes shared with pneumoccocus 



ORFxxxxxAnn tati n 



ORF00017 phosphoribosylaminoimidazolecarboxamide formyltransferase/IMP cyclohydrolas (purH) 



QRF00025 conserved hypothetical protein 



QRFQ0029 acetyl xylan esterase, putative 

ORF00042 aldehyde-alcohol dehydrogenase (adhE) 



ORF00044 threonine synthase (thrC) 



ORF00081 ribosornal protein L17 (rplQ) 



ORF00090 conserved hypothetical protein 
ORF0Q129 arglninosuccinate synthase (argG) 



ORF00156 oligopeptide ABC transporter, substrate-binding protein, putative 
ORF00189 protease, putative 



ORFQ0194 thioredoxin family protein 
ORF00195 tRNA binding domain protein 



ORF00217 conserved domain protein 
ORF00218 PTS system, IIB component, putative 



ORF00220 transketolase, N-termlnal subunit 
ORF00221 transketolase, C-terminal subunit 



ORF00223 oxidoreductase, putative 
ORF00282 acetyltransferase, GNAT family 



ORF00290 IS1381, transposase OrfB 
ORF00291 IS1381, transposase OrfA 



ORF00293 conserved hypothetical protein 
ORF0Q301 membrane protein, putative 



ORF00343 ABC transporter, permease protein, putative 
ORF00344 conserved hypothetical protein 



ORF00382 aspartate kinase family protein 
ORF00399 conserved hypothetical protein 



ORF00439 cell wall surface anchor family protein 
ORF00447 cytidine/deoxycytidylate deaminase family protein" 



ORF00450 5-formyltetrahydrofolate cyclo-ligase family protein 
ORF00480 transcriptional regulator, MerR family 



ORF00499 acetyltransferase, GNAT family 
ORF00504 magnesium transporter, CorA family 



ORF00521 VanZF domain protein 
ORF00612 1S1381, transposase OrfA 



ORF00613 IS1381, transposase OrfB 



ORF00690 transmembrane protein Vexpl (vexl) 



ORF00691 ABC transporter, ATP-binding protein Vexp2 (vex2) 
ORF00692 transmembrane protein Vexp3 (vex3) 



ORF00714 conserved hypothetical protein 

ORF00732 expressed cell wall surface anchor family protein, putative 



ORF00774 ABC transporter, ATP-binding protein 
ORF00778 ABC transporter, ATP-binding protein 



ORF00780 conserved hypothetical protein 
ORF00790 beta-glucurdnidase 



ORF00800 alpha amylase family protein 
ORF00807 amino acid ABC transporter, permease protein 



ORF00809 amino acid ABC transporter, amino acid-binding protein 
ORF00814 conserved hypothetical protein 



ORF00823 bacterial luciferase family protein 

ORF00840 riboflavin biosynthesis protein RibD (ribD) 



ORF00841 riboflavin synthase, alpha subunit (ribE) 
ORF00842 riboflavin biosynthesis protein RibA (ribA) 



ORF00843 riboflavin synthase, beta subunit (ribH) 



ORF00866 penicillin-binding protein 2tT 
ORF00905 membrane protein, putative 
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Table 9: GBS g nes shared with pneumoc ocus 

ORFxxxxxAnn tati n 

ORF00910 major facilitator family protein 

ORF00913 hydrolase, haloacid dehalogenase-like family 

ORF00918 conserved hypothetical protein 

ORF00945 conserved hypothetical protein 

ORF00948 ABC transporter, ATP-binding protein 

ORF00952 phosphomethylpyrimidine kinase (thiP) 

ORF00953 hydroxyethylthiazole kinase (thiM) 

QRF00954 thiamine-phosphate pyrophosphorylase (thiE) 

QRF00961 GtrA family protein 

ORF00967 1 t 4-alpha-glucan branching enzyme (glgB) 

ORF00968 glucose-1 -phosphate adenylyltransferase (glgC) 

QRF00971 glycogen synthase (glgA) ____ 

QRF0Q985 acetyltransferase, GNAT family 

ORF00990 magnesium transporter, CorA family, putative , 

ORF01022 nucleoside diphosphate kinase (ndk) 

ORF01031 nucleoside diphosphate kinase domain protein . 

ORF01085 conserved hypothetical protein __ 

QRFQ1087 1S1381, transposase OrfA 

QRF01088 1S1381, transposase OrfB 

ORF01098 ABC transporter, permease protein, putative m 

ORF01100 sensor histidine kinase 

ORF01102 ABC transporter, substrate-binding protein m 

ORF01127 protease, putative __^_ 

ORF01135 iron compound ABC transporter, permease protein 

ORF01136 iron compound ABC transporter, permease protein 

ORF01185 aspartate-semialdehyde dehydrogenase (asd) 

ORF01217 conserved hypothetical protein 

ORF01218 conserved hypothetical protein 

ORF01219 formate/nitrite transporter family protein 

ORF01226 oxidoreductase, short chain dehydrogenase/reductase family, FRAMESHIFT 

ORFQ1254 homoserine kinase (thrB) 

ORF01255 homoserine dehydrogenase (horn) _ 

QRF01 264 transcriptional regulator, Cro/CI family 

ORF01268 thiol peroxidase (psaD) . 

ORF01305 glycosyltransferase CpsJ(V) (cpsJ), 

ORF01306 glycosyltransferase CpsO(V) (cpsO) 

ORF01313 CpsP protein (cpsP) 

ORF01314 cpsC protein (cpsC) 

QRF01315 capsular polysaccharide biosynthesis protein CpsB (cpsB) 

ORFQ1316 capsular polysaccharide biosynthesis protein CpsA (cpsA) 

ORF01326 conserved hypothetical protein 

ORF01333 alpha-acetolactate decarboxylase (budA) 

QRF01334 acetolactate synthase, catabolic (ilvK) _ 

ORF01337 MutT/nudix family protein m 

ORF01369 MATE efflux family protein 

ORF01398 Tn5252, Orf 9 protein 

ORF01399 Tn52S2, Orf 10 protein 

ORF01446 protease, putative _ 

ORF01447 conserved hypothetical protein 

ORF01449 conserved hypothetical protein 

QRF01492 NAPP-specific glutamate dehydrogenas (gdhA) 

ORF01569 expressed cell wall surface anchor family protein 

QRF01570 cell wall surface anchor family protein 

ORF01574 polysaccharide biosynthesis protein 

ORF01579 nucleotidyl transferase, putative 
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ORFxxxxx Annotati n 



ORFQ1580 polysaccharide biosynthesis protein, putative 



ORF01612 conserved hypothetical protein 



ORF01613 gtycosyl transferase, group 1 family protein 



ORF01617 conserved hypothetical protein 



ORF01618 conserved hypothetical protein 



ORF01621 gtycosyl transferase, putative 

ORF01622 glycosyl transferase, group 2 family protein 

ORF01623 alvnnsvl transferal familv R HonAnorato 



QRF01623 glycosyl transferase, family 8, degenerate 



ORF01624 IS1381, transposase OrfB 



ORF01625 IS1381, transposase OrfA 



ORF01626 glycosyl transferase family 8 



ORF01627 glycosyl transferase, family 8 



ORF01628 conserved hypothetical protein 

ORF01630 cell wall surface anchor family protein 



QRF01635 protease, putative 



ORF01643 aminopeptidase PepS (pepS) 



ORF01702 peptidase, M20/M25/M40 family 



ORF01731 IS1 381, transposase OrfA 



ORF01732 1S1381, transposase OrfB 



ORF01740 tellurite resistance protein TehB (tehB) 



ORF01747 methylated-PNA-protein-cysteine S-methyltransferase (ogt) 



ORF01749 acetyltransferase, GNAT family 



ORF01763 AcuB family protein 



ORF01764 branched-chain amino acid ABC transporter, ATP-binding protein (livF) 
ORF01765 branched-chain amino acid ABC transporter, ATP-binding protein (livG) 



ORF01766 branched-chain amino acid ABC transporter, permease protein 
ORF01767 branched-chain amino acid ABC transporter, permease protein (livH) 



ORF01769 branched-chain amino acid ABC transporter, amino acid-binding protein 



ORF01775 aminotransferase, class I 



ORF01779 potassium uptake protein, Trk family 



ORF01780 cation uptake protein, Trk family 



ORF01824 cobalt transport family protein 



ORF01826 conserved hypothetical protein 



ORF01832 peptidase, M20/M25/M40 family 



ORF01845 conserved hypothetical protein 

ORF01848 transcriptional regulator, MerR family 



ORF018S3 isochorismatase family protein 
ORF01859 membrane protein 



ORF01875 oxidoreductase, aldo/keto reductase family 
ORF01880 phospho-2-dehydro-3-deoxyheptonate aldolase 



ORF01981 rRNA(guanine-N1-^methy)transferase, putative 

ORF02083 prophage LambdaSa2, DNA replication protein DnaC, putative 



ORF02101 Na+/H+ exchanger family protein 
ORFQ2107 membrane protein, putative 



ORF02139 UDP-glucose 4-epimerase (galE) 



ORF02143 iacX protein 



ORF02162 conserved hypothetical protein 
ORF02186 hemolysin precursor, putative 



ORFQ2192 transcriptional regulator, MerR family 
ORF02195 MutT/nudix family protein 



ORF02228 IS1381, transposase OrfB 
ORF02229 IS1381, transposase OrfA 



ORF02233 conserved hypothetical protein 
ORF02234 conserved hypothetical protein 



ORF02276 5-methyltetrahydropteroyltriglutamate-homocysteine methyltransferase (metE) 
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ORFxxxxx Annotation 



ORF02278 branched-chain amino acid transport protein AzIC, putativ 



ORF02288 glycosyt transferase, family 8 



ORF02289 glycosyl transferase, family 8 



ORF02341 ribosomal protein L32 (rpmF) 



QRF02343 conserved hypothetical protein 



ORF02358 sensor histidine kinase 



ORF02369 conserved hypothetical protein 



ORF02384 LysM domain protein 



ORF02428 hypoxanthine-guanine phosphoribosyltransferase (hpt) 



ORF03011 ribosomal protein L33 



ORF03014 ribosomal protein L33 
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ORFxxxxxAnn tatlon 

ORF00064 ribosomal protein S14, putative 

ORF00095 D-alanyl-D-alanine carboxypeptidase family protein 

QRF00096 N-acetylmuramoyl-L-alanine amidase, family 4 protein 

ORF00110 conserved hypothetical protein 

ORF00112 DNA repair protein RadA (radA) 

ORF00124 permease, putative 

ORF00148 glycosyl transferase, group 4 family protein 

ORF00154 penicillin-binding protein 4, putative 

ORF0Q157 oligopeptide ABC transporter, permease protein 

QRF00206 oligopeptide ABC transporter, oligopeptide-binding protein 

ORF0Q207 oligopeptide ABC transporter, permease protein 

ORF00208 oligopeptide ABC transporter, permease protein 

QRF00209 peptide ABC transporter, ATP-binding protein 

ORF00210 peptide ABC transporter, ATP-binding protein 

ORF00216 IS1548, transposase 

ORF00226 conserved hypothetical protein 

ORF00232 conserved hypothetical protein 

ORF00239 site-specific recombinase, phage integrase family 

ORF00250 conserved hypothetical protein 

ORF00251 conserved hypothetical protein 

ORF00289 ABC transporter, ATP-binding protein 

ORF00305 NADH oxidase, putative 

QRF00317 cell division protein FtsL, putative 

ORF00333 conserved hypothetical protein 

ORF0Q383 hydrolase, haloarid dehalogenase-like family 

ORF00430 expressed putative lipoprotein 

ORF00431 transcriptional repressor CopY 

ORF00434 membrane protein, putative 

ORF00438 transcriptional regulator, Fur family 

ORF00442 membrane protein, putative 

ORF00445 bioY family protein 

ORF00446 AtsA/ElaC family protein 

ORF00468 expressed putative protease 

ORF0Q469 glycosyl transferase, group 2 family protein 

ORF00471 nrdl protein (nrdl) 

ORF00473 expressed protein of unknown function 

ORF00474 conserved hypothetical protein 

ORF00507 conserved hypothetical protein 

ORF00525 bioY family protein 

ORF00528 thiolase 

ORF00531 AMP-binding enzyme domain protein 

ORF00548 YGGT family protein 

ORF00565 exodeoxyribonuclease VII, small subunit (xseB) 

ORF00568 arginine repressor ArgR, putative 

ORF00572 expressed putative lipase/acylhydrolase 

ORF00573 conserved hypothetical protein 

ORFQ0586 iron-sulfur cluster-binding protein, putative 

ORF00592 oxidoreductase, short chain dehydrogenase/reductase family 

ORF00604 dipeptidase 

ORF0Q611 voltage-gated chloride channel family protein 

ORF00619 prophage LambdaSal, repressor protein, putativ 

ORF00622 conserved hypothetical protein 

ORF00627 prophag LambdaSal, antirepressor, putative 

ORFQ0634 conserved hypothetical protein 

ORF00648 conserved hypothetical protein 
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ORF00654 conserved hypothetical protein ; 

ORF00655 conserved hypothetical protein 

ORF00656 conserved hypothetical protein 

ORF00658 conserved hypothetical protein 

ORF00659 conserved hypothetical protein 

ORF00660 prophage LambdaSal, structural protein, putative 

ORF00662 conserved hypothetical protein 

ORF00663 conserved hypothetical protein ____ 

ORF0Q664 conserved hypothetical protein 

ORF00665 conserved hypothetical protein 

ORFQ0666 prophage LambdaSal, structural protein „ 

ORF00668 conserved hypothetical protein . 

ORF0Q669 prophage LambdaSal, pbIA protein, internal deletion 

QRF00677 prophage LambdaSal, lysin, putative _ 

ORF00679 conserved hypothetical protein __ 

ORF00695 transposase OrfB, IS3 family, truncation 

ORF00697 conserved hypothetical protein _ 

ORF00707 conserved domain protein _ 

ORFQQ713 acid phosphatase precursor, class B 

ORF00720 transposase OrfB, IS3 family FRAMESHIFT 

ORF00721 transposase QrfA, IS3 family 

ORF00751 cylA protein (cylA) _____ 

ORF00755 cyll protein (cyll) 

ORF0076Q serine protease, subtilase family, putative POINT MUTATION 

ORF00781 transcriptional regulator, LysR family __ 

ORF00783 regulatory protein, putative ; 

ORF00785 IS1548, transposase . 

ORF00786 regulatory protein, putative, truncation 

ORF00787 P-lactate dehydrogenase (IdhA) _ 

ORF00801 glycosyl transferase, group 1 family protein 

ORF00805 conserved hypothetical protein 

ORF00826 phage shock protein C, putative ; 

ORF00833 conserved hypothetical protein 

ORF00845 hydrolase, haloacid dehalogenase-like family 

ORF00852 conserved hypothetical protein ___ 

ORF00853 expressed putative lipoprotein 

ORF00857 IS1548, transposase 

ORF00890 conserved hypothetical protein __ 

ORF00902 conserved hypothetical protein m 

ORF00926 membrane protein, putative 

ORF00927 membrane protein, putative . 

ORF00987 conserved hypothetical protein m 

ORF01009 expressed protein of unknown function 

ORFQ1010 lipoyl-binding domain protein 

ORF01011 oxidoreductase, putative 

ORF01012 conserved hypothetical protein 

ORF01024 expressed putative lipoprotein 

ORF01061 signal peptidase I, putative 

ORF01Q64 IS1548, transposase 

ORF01084 glyoxylas family protein 

ORF01104 SatD 

ORF01126 conserved hypothetical protein a _ 

QRFQ1191 conserved hypothetical protein 

ORF01192 conserved hypothetical protein 

ORF01193 glycine cleavage system H protein, putative 
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ORF01194 bacterial luciferase family protein m 

ORF01195 oxidoreductase, FMN-binding _ _ 

ORF01197 lipoate-protein ligase A family protein 

ORF01202 IS861, transposase OrfA 

ORF01223 drug resistance transporter, EmrB/QacA family, putative 

ORF01224 conserved hypothetical protein 

ORF01225 potassium uptake protein, putative 

ORF01237 membrane protein,- putative m __ 

ORF01249 dihydroneopterin aldolase (folB) ' 

ORFQ12S6 polysaccharide deacetylase family protein 

ORF01273 transcriptional regulator, GntR family/potassioum uptake protein, TrkA family 

ORF01280 conserved hypothetical protein 

ORF01281 conserved hypothetical protein m 

QRF01289 lipoprotein, putative . 

ORF01291 conserved hypothetical protein m 

ORF01298 conserved hypothetical protein 

ORF01318 conserved hypothetical protein a 

ORF01320 voltage-gated chloride channel family protein, putative 

ORF01322 arsenate reductase (arsC) 

ORF01340 dTDP-glucose 4,6-dehydratase (rfbB) 

QRF01341 dTDP-4-dehydrorhamnose 3,5-epimerase 

ORF01 342 glucose-1 -phosphate thy midyly Itransferase (rfbA) 

ORF01356 hypothetical protein 

ORF01368 conserved hypothetical protein ( 

ORF01374 ISSdyl, transposase OrfB 

ORF01388 transposase OrfA, IS3 family . ; 

ORF01389 transposase OrfB, IS3 family, truncation 

ORF01391 ISSdyl, transposase OrfB FRAMESHIFT 

ORFQ1396 transcriptional regulator, Cro/Cl family 

ORF01419 repressor protein, putative 

ORF01461 amino acid permease . 

ORF01469 conserved hypothetical protein 

ORF01483 sensor histidine kinase 

ORF01485 GTP pyrophosphokinase family protein 

ORF01490 ^-nucleotidase family protein m 

ORF01509 2-dehydropantoate 2-reductase, putative 

ORF01510 regulatory protein, putative , 

ORF01522 carbamoyl-phosphate synthase, large subunit, putative 

ORF01542 sulfatase 

ORF01549 conserved hypothetical protein _____ 

ORF01554 iron compound ABC transporter, substrate-binding protein 

ORFQ1557 conserved hypothetical protein 

ORF01563 conserved hypothetical protein T1GR01212 

ORF01583 glycosyltransferase, group 2 family protein 

ORFQ1584 glycosy Itransferase, group 2 family protein 

ORF01585 glycosyltransferase, putative 

ORF01586 dTDP-4-dehydrorhamnose reductase (rfbP) 

ORF01593 conserved hypothetical protein t 

ORF01599 conserved hypothetical protein 

ORF01600 glycerol-3-phosphate transporter, putative 

ORF01639 conserved hypothetical protein 

ORF01650 nitroreductas family protein 

ORF01653 amino acid permease 

ORFQ1665 transcriptional regulator, MutR family 

ORF01683 MutT/nudix family protein 
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ORFxxxxxAnn tatlon m . _ 

ORF01686 67 kDa Myosin-crossreactive streptococcal antigen 

ORF01688 peptide methionine sulfoxide reductase (msrA) 

ORF01694 peptide ABC transporter, permease protein 

ORF01704 conserved hypothetical protein 

ORF01705 IS861, transposase OrfA 

ORF01741 membrane protein, putative m 

ORF01770 conserved hypothetical protein 

ORF01772 1S1548, transposase 

ORF01790 conserved hypothetical protein 

ORF01794 conserved hypothetical protein 

ORFQ1800 amino acid ABC transporter, substrate-binding protein 

ORF01810 1S1548, transposase m . 

ORF01827 sodium:dicarboxylate symporter family protein 

ORF01877 immunogenic secreted protein, putative 

ORF01913 transcriptional regulator, Cro/Ct family 

ORF01928 membrane protein, putative 

ORF01931 transporter, putative 

ORF01932 transcriptional regulator, Crp/Fnr family 

ORF01947 transcriptional regulator, merR family m _____ 

. ORF01970 acid phosphatase ; 

ORF02002 amino acid ABC transporter, permease protein m 

ORF02028 perfringolystn O regulator protein (pfoR) 

ORF02029 conserved hypothetical protein 

ORF02031 expressed protein of unknown function 

QRF02032 expressed protein of unknown function 

ORF02035 deoxynucleoside kinase family protein 

ORF02Q42 alkyl hydroperoxide reductase, subunit C (ahpC) 

ORF02126 transcriptional regulator, MarR family , 

ORF02128 N-acetylmuramoyl-L-alanine amidase, family 4 protein . 

ORF02135 malate oxidoreductase 

ORF02136 citrate carrier protein, CCS family 

ORF02137 sensor histidine kinase family protein 

ORFQ2138 response regulator 

' ORF02166 conserved hypothetical protein 

ORF02169 PTS system, IIB component m 

ORF02170 PTS system, HA component, putative 

ORF02202 ABC transporter, ATP-binding protein 

ORF02262 ABC transporter, ATP-binding protein _ 

ORF02270 cAMP factor (cfb) 

ORF02280 serine protease, subtilase family, putative 

ORF02286 major facilitator family protein 

ORF02292 preprotein translocase, SecE subunit, putative 

ORF02295 Lyme disease proteins of unknown function, putative 

ORFQ2298 Na+ dependent nucleoside transporter 

ORF02301 transcriptional regulator, GntR family 

ORF02313 virulence factor MviM, putative . 

ORF02316 membrane protein, putative _ 

ORF02319 conserved hypothetical protein T1GR00250 _ 

ORF02328 transporter, putative _ — 

QRFQ2331 cold shock protein, CSD family 

ORF02332 DNA mismatch repair protein HexA (h xA) 

ORF02335 conserved hypothetical protein 

ORF02372 conserved hypothetical protein 

ORF02383 expressed putative lipoprotein __ 

ORF02393 transporter, putativ 
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ORF02398 transcriptional regulator, Crp/Fnr family 



QRF02399 conserved hypothetical protein 



ORF02401 acetyltransferase, GNAT family 



QRF02403 arginine/ornithine antlporter (arcP) 



ORF03002 conserved hypothetical protein, truncation 
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QRFxxxxxAnn tation 



ORF00Q08 protease, putative 



ORF00010 acyl carrier protein (acpP) 



ORF00016 acetyltransferase, GNAT family 



QRF00018 peptidase, M23/M37 family, putative secreted protein 



ORF00035 membrane protein, putative 



ORF00087 lipoprotein, putative 



ORF00088 hypothetical protein 



ORF00089 hypothetical protein 



ORF00091 conserved hypothetical protein 



ORF00117 ribose ABC transporter, periplasmic D-ribose-binding protein (rbsB) 



ORF00118 ribose ABC transporter, permease protein (rbsC) 



ORF0012Q ribose ABC transporter protein RbsD (rbsP) 



ORF00121 ribokinase (rbsK) 



ORF0Q123 hypothetical protein 



ORF00130 argininosuccinate lyase (argH) 



ORF00137 conserved hypothetical protein 



ORF00138 hypothetical protein 



ORFQ0166 4^iphosphocytidyi-20methyl-D-erythritol kinase (ispE) 



ORF00182 conserved domain protein 



ORF001 86 transcriptional regulator, Cro/Ct family 



ORF00187 hypothetical protein 



ORF00188 hypothetical protein 



ORF00192 hypothetical protein 



QRFQ0193 conserved hypothetical protein 



QRF00196 conserved hypothetical protein 



ORF00199 hydrolase, haloacid dehalogenase-like family 



ORF002Q0 sensor hlstidine kinase, putative 



QRF00201 response regulator 



ORF00203 conserved hypothetical protein 



ORF00204 membrane protein, putative 



QRF00205 hypothetical protein 



QRFQ0228 lipoprotein, putative 



ORF00234 hypothetical protein 



ORF00235 hypothetical protein 



ORF00238 hypothetical protein 



ORF00240 transcriptional regulator, Cro/CI family 



ORF00241 hypothetical protein 



ORF00242 conserved hypothetical protein 



ORF00243 hypothetical protein 



ORF00244 conserved domain protein 



QRF00245 conserved hypothetical protein, fusion 



ORF00246 replication initiation protein, putative 



ORF0Q247 hypothetical protein 



ORF00248 recombination protein 



ORF00249 hypothetical protein 



ORF00252 conserved hypothetical protein 



ORF00253 hypothetical protein 



ORF00254 hypothetical protein 



ORF00255 hypothetical protein 



ORF00256 hypothetical protein 



ORF00257 hypothetical protein 



ORF00258 hypothetical protein 



ORF00259 hypothetical protein 



ORF00260 hypothetical protein 



ORF00272 expressed putative lipoprotein 
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QRF00273 hypothetical protein 

QRF00274 hypothetical protein 

QRF00275 hypothetical protein 

ORF00276 hypothetical protein 

ORF00278 membrane protein, putative ; 

ORF00279 transcriptional regulator, Cro/CI family 

ORF00280 acetyltransferase, GNAT family 

QRF00281 acetyltransferase, GNAT family . 

ORF00283 conserved hypothetical protein 

ORFQ0284 RNA polymerase sigma factor, ECF subfamily 

ORFQ0285 lipoprotein, putative 

ORF00287 transcriptional regulator, TetR family 

ORF00288 ABC transporter efflux protein, DrrB family, putative 

ORF00292 hypothetical protein 

ORF00294 expressed protein of unknown function 

ORF00298 acyl carrier protein phosphodiesterase, putative 

ORF00308 conserved hypothetical protein 

ORF00324 conserved hypothetical protein 

ORF00332 hypothetical protein 

ORF00340 hypothetical protein 

ORF00347 conserved hypothetical protein 

ORF00384 hypothetical protein 

ORF00402 membrane protein, putative 

ORF00408 hypothetical protein 

ORF00409 membrane protein, putative 

ORFQ0414 conserved hypothetical protein 

ORFQ0416 hypothetical protein 

ORF00417 hypothetical protein 

ORF00433 copper-transporter protein CopZ 

ORF00448 hypothetical protein 

ORF00466 conserved hypothetical protein 

ORF0Q467 acetyltransferase, GNAT family 

ORF00475 conserved domain protein 

QRF00476 hypothetical protein 

ORF00478 carboxymuconolactone decarboxylase family protein 

ORF00479 conserved hypothetical protein 

ORF0Q486 transcriptional regulator, AraC family 

ORF00487 surface protein Rib 

QRF00488 transposase, IS256 family, truncation 

ORF00489 DNA-damage-inducible protein J t putative 

ORF00490 hypothetical protein 

ORF00491 lipoprotein, putative 

ORF00493 bacteriophage L54a, integrase, truncation 

ORF00497 conserved domain protein 

ORF005Q3 oxidoreductase, Gfo/ldh/MocA family 

ORF00506 transposase, IS256 family 

ORF0Q510 bacteriocin transport accessory protein, putative 

ORF00512 hypothetical protein 

ORF00526 btotjn synthetase (bioB) 

ORF00527 hypothetical protein 

ORF00533 type IV prepilin peptidase-related protein 

ORF00538 conserved hypothetical protein 

ORF00556 hypothetical protein 

QRF0Q563 xpressed protein of unknown function 

ORF00575 hypothetical protein 
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|ORF00584 conserved hypothetical protein 



|ORF00585 fructose-1 ,6-bisphosphatase, putative 
IORF00590 carboxymethylenebutenolidase-related protein 



1ORF00597 conserved hypothetical protein 



ORF00598 inosine-uridine preferring nucleoside hydrolase 



ORF00599 hypothetical protein 



ORF00600 OsmC/Ohr family protein 



ORF00608 adenosine deaminase, putative 



pRFQ0610 chorismate mutase, putative 



loRF00615 prophage LambdaSal, site-specific recombinase, phage integrase family 
ORF00617 conserved domain protein 



I 0RFOQ6I8 hypothetical protein 



lORF00620 hypothetical protein 



QRF00621 conserved hypothetical protein 



1ORF00623 hypothetical protein 



ORF00624 hypothetical protein 



ORF00626 prophage LambdaSal, transcriptional regulator, Cro/Cl family 



I ORF00628 hypothetical protein 



ORF0063Q hypothetical protein 



ORF00632 hypothetical protein 



ORF00633 conserved hypothetical protein 



lORF00635 hypothetical protein 



ORF0Q636 hypothetical protein 



ORF00637 hypothetical protein 



jORF0Q638 conserved hypothetical protein 



IORF00639 conserved domain protein 



ORF00641 prophage LambdaSal, reverse transcriptase/maturase family protein 



ORF0Q642 conserved hypothetical protein 



1ORF00643 conserved hypothetical protein 
ORF00644 hypothetical protein 



ORFQ0645 hypothetical protein 



ORF00646 conserved hypothetical protein 



lORF00647 hypothetical protein 



ORF00649 hypothetical protein 



1ORFQ06S0 hypothetical protein 



ORF00652 conserved hypothetical protein 



ORF00653 conserved hypothetical protein 



1ORF00657 conserved hypothetical protein, truncation 



IORF00661 conserved hypothetical protein 



ORF00667 conserved hypothetical protein 



ORF00670 prophage LambdaSal, minor structural protein, putative 



IORFQ0671 prophage LambdaSal, N-aretylmuramoyl-L-alanine amidase, family 4 
ORF0Q672 prophage LambdaSal , minor structural protein, putative 
ORF00673 hypothetical protein 
ORF00674 hypothetical protein ~ 



QRF00675 consen/ed hypothetical protein 



QRF00676 conserved hypothetical protein 



1ORF00678 conserved hypothetical protein 



ORF00681 conserved hypothetical protein 



1ORF00682 hypothetical protein — 

|ORF00683 prophag LambdaSal, site-specific recombinase, phage Integrase family FRAMESH1FT 
ORF00685 conserved hypothetical protein 
ORF00689 conserved hypothetical protein, FRAMESH1FT 



QRF00698 hypothetical protein 



1ORF007Q3 phosphoserine phosphatase SerB (serB) 
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|ORF00704 MutT/nudix family protein 



IORFQ0712 hypotheticalprotein 



|ORF00718 cell wall surface protein, interruption-N 
loRF00723 hypothetical protein 



IORF00726 transcriptional regulator, AraC family 



IORF00727 expressed cell wall surface anchor family protein 
ORFQ0728 expressed cell wall surface anchor family protein 
1ORF00735 expressed protein of unknown function 



| ORF00737 conserved hypothetical protein, degenerate 
ORF00738 hypothetical protein 
pRF00740 hypothetical protein 



IORF00741 hypothetical protein 



I ORF00742 lipoprotein, putative 



I ORF00747 cylD protein (cylD) 



1ORF0Q749 acyl carrier protein AcpC 



1 ORF00750 cylZ protein FRAMESHIFT 



IORF00752 cylB protein (cylB) 



IORF00753 cylE protein (cylE) 



IORF007S4 cylF protein (cylF) 



1ORF0Q756 cylJ protein (cyl J) 



IORF00757 cylK protein (cylK) 



IORF00758 hypothetical protein 



1ORFQ0759 putative secreted protein 



I ORF00761 hypothetical protein 



ORF0Q766 expressed putative secreted protein 
1ORF00767 hypothetical protein" 



1ORF00768 conserved domain protein 



1ORF00769 permease, putative 



1ORF00775 conserved hypothetical protein 



I ORF00777 DedA family protein, putative 



IORF00779 membrane protein, putative 



|ORF00788 sodium:galactoside symporter family protein, putative 



1ORFQ0791 transcriptional regulator, GntR family 



IQRF00793 Glucuronate isomerase (uxaC) 



IORF00794 mannonate dehydratase (uxuA) 



IORF00795 D-mannonate oxidoreductase 



1ORF00796 hydrolase, haloacid dehalogenase-like family 



1ORF00797 glycosyl hydrolase, family 3 



IORF00806 consen/edhypothetical protein 



IORF00822 ABC transporter, ATP-binding protein 



IORF00827 hypothetical protein 



IORFQ0834 conserved hypothetical protein 



IQRF00838 membrane protein, putative 



IORF00839 Mn2^/Fe2^ transporter, NRAMP family 



IORF00848 conserved domain protein 



1QRF00872 cell wall surface anchor family protein 



IORF00874 conserved hypothetical protein 



IORF00878 ABC transporter, permease protein 



IORF00879 YaeC family protein, putative 

IORF00888 hydrolase, haloacid dehalogenase-like family" 



1ORF00891 conserveddomain protein 



1ORF00898 conserved hypothetical protein 



IORF00900 permease, GntP family 



1ORF00903 transcriptional regulator, MarR family 



IORF00907 glutathione S-transferase family protein 
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ORF00909 hypothetical protein 



ORF00921 membrane protein, putative 



QRF00922 glycosyl transferase, family 8 



ORF00923 hypothetical protein 



QRF00924 conserved hypothetical protein 



ORF00939 conserved hypothetical protein 



ORF00942 expressed putative secreted protein 



ORF00943 hypothetical protein 



ORF00944 hypothetical protein 



ORF00946 conserved hypothetical protein 



ORF00950 hypothetical protein 



ORF00951 transcriptional regulator, TenA family 



ORF00972 ATP synthase FO, C subunit (atpE) 



ORF00980 conserved hypothetical protein 



ORF00982 conserved hypothetical protein 



ORF01003 conserved hypothetical protein 



ORF01004 conserved hypothetical protein 



ORFQ1013 hypothetical protein 



ORF01014 hypothetical protein 



ORF01015 hypothetical protein 



ORF01016 hypothetical protein 



ORF01018 hypothetical protein 



ORFQ1Q19 hypothetical protein 



ORF01021 hypothetical protein 



ORF01025 HP domain protein 



ORF01026 acetyltransferase, GNAT family 



ORF01032 chloramphenicol acetyltransferase (cat) 



ORF01034Tn916 t transposase 



ORF01035 Tn916, excisionase 



ORF01Q37 Tn916, hypothetical protein 



ORF01038Tn916, hypothetical protein 



QRF01039 Tn916, transcriptional regulator, putative 



ORF01041 Tn916 t hypothetical protein 



ORF01042 Tn916 t NLP/P60 family protein 



ORF01044 membrane protein, putative FRAMESHIFT 



ORF01048 Tn916, hypothetical protein 



ORF01049 Tn916, hypothetical protein 



ORF01050 Tn916, hypothetical protein 



ORF01051 Tn916 t transcriptional regulator, putative 



ORF01052 Tn916 t FtsK/SpolllE family protein 



ORF01053 Tn91 6, hypothetical protein 



ORF01054Tn916, hypothetical protein 



ORF01062 hypothetical protein 



ORF01086 Na+/H+ exchanger family protein 



ORF01092 acetyltransferase, GNAT family 



ORF01096 nisin-resistance protein, putative 



QRF01103 conserved hypothetical protein 



ORF01124 acetyltransferase, GNAT family 



ORF01133 iron-compound ABC transporter, iron-compound-binding protein 



ORF01140 conserved hypothetical protein 



ORF01142 carbon starvation protein CstA, putative 



ORF01143 response regulator 



ORF01144 sensor histidine kinase, putative 



ORF0114S lipoprotein, putative 



ORF01146 conserved hypothetical protein, FRAMESHIFT 
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ORF01148 lipoprotein, putative 



QRF01149 hypothetical protein 



ORF01150 hypothetical protein 



ORF01151 hypothetical protein 



ORF01152 lipoprotein, putative 



ORF01153 hypothetical protein 



ORF01157 conserved hypothetical protein 



ORF01158 hypothetical protein 



ORF01159 hypothetical protein 



ORF01160 expressed protein of unknown function FRAMESHIFT 



ORF01161 expressed conserved domain protein 



ORF01162 conserved hypothetical protein 



ORF01164 FtsK/SpolllE family protein FRAMESHIFT 



ORF01166 hypothetical protein 



ORF01167 conserved hypothetical protein 



ORF01168 conserved hypothetical protein 



ORF01169 hypothetical protein 



ORFQ1172 phage infection protein, putative 



ORF01173 conserved hypothetical protein 



ORF01174 conserved domain protein 



ORF01175 hypothetical protein 



ORF01182 membrane protein, putative 



ORFQ1186 cell wall surface anchor family protein, putative 



ORF01187 hypothetical protein 



ORF01204 hypothetical protein 



ORF01215 hypothetical protein 



ORF01241 transcriptional regulator, AraC family, putative 



ORF01253 rarD protein (rarD) 



ORF01257 transporter, BCCT family protein 



ORF01258 hypothetical protein 



ORFQ1261 expressed protein of unknown function 



ORF01262 conserved hypothetical protein, FRAMESHIFT 



ORF01263 hypothetical protein 



3RF01265 hypothetical protein 



ORF01266 hypothetical protein 



ORFQ1269 conserved hypothetical protein 



ORF01272 conserved hypothetical protein 



ORF01277 conserved hypothetical protein 



ORF01287 conserved hypothetical protein 



ORF01288 membrane protein, putative 

ORF01299 CMP-N-acetylneuraminic acid synthetase NeuA (neuA) 



ORF01300 neuD protein (neuD) 



ORF01301 UDP-N-acetylglucosamine-2-epimerase NeuC (neuC) 



ORF01302 N-acetyl neuramic acid synthetase NeuB (neuB) 



ORF01303 polysaccharide biosynthesis protein CpsL (cpsL) 



ORF01304 polysaccharide biosynthesis protein CpsK(V) (cpsK) 



QRF01307 glycosyltransferase CpsN(V) (cpsN) 



ORF01308 polysaccharide biosynthesis protein CpsM(V) (cpsM) 



ORF01309 polysaccharide biosynthesis protein cpsH(V) (cpsH) 



ORF01310 glycosyltransferase CpsG(V) (cpsG) 



ORF01311 polysaccharide biosynthesis protein CpsF (cpsF) 



ORF01312 glycosyltransferase CpsE (cpsE) 



ORF01348 conserved domain protein 



ORF01349 hypothetical protein 



ORF01370 conserved hypothetical protein 
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ORF01371 conserv d hypothetical protein 



ORF01372 expressed protein of unknown function 



ORF01373 ISSdyl, transposase OrfA 



ORF01 375 conserved hypothetical protein 



ORF01379 transposase OrfB, IS3 family, truncation 



ORF01382 GBSil, group ll intron, maturase 



ORF01384 hypothetical protein 



QRFQ1385 hypothetical protein 



ORF01386 conserved hypothetical protein 



ORF01387 conserved hypothetical protein, truncation 



ORFQ1390 ISSdyl , transposase OrfA FRAMESHIFT 



ORF01392 hypothetical protein 



ORF01393 hypothetical protein 



ORF01394 site-specific recombinase, phage integrase family 



ORF01395 conserved hypothetical protein 



ORF01401 transposase, ISL3 family 



ORF01404 mercuric resistance operon regulatory protein MerR (merR) 



ORF01408 cadmium efflux system accessory protein (CadC) 



ORF01409 conserved hypothetical protein 



ORF01410 hypothetical protein 



ORF01417 hypothetical protein 



ORF01418 hypothetical protein 



ORF01420 hypothetical protein 



ORF01421 ImpB/MucB/SamB family protein 



ORF01 423 conserved hypothetical protein 



ORF01424 conserved hypothetical protein 



ORF01425 conserved hypothetical protein 



ORF01426 conserved hypothetical protein 



ORF01427 hypothetical protein 



ORF01428 conserved hypothetical protein 



ORF01430 hypothetical protein 



ORF01431 hypothetical protein 



ORF01432 conserved domain protein 



ORF01433 SNF2 family protein 



ORF01434 hypothetical protein 



ORF01435 calcium-binding protein, putative 



ORF01436 agglutinin receptor (ssp-5) 



ORF01437 abortive infection protein AbiGl (abiGI) 



ORF01438 abortive infection protein AbiGII (abiGH) 



ORF01439 conserved hypothetical protein 



QRF0144Q expressed protein of unknown function 



ORF01441 conserved hypothetical protein, degenerate 



ORF01442 membrane protein, putative 



QRF01443 hypothetical protein 



ORF01444 Tn5252, Orf 21 protein, internal deletion 



ORF01445 hypothetical protein 



QRF01450 conserved hypothetical protein 



ORF01452 hypothetical protein 



ORF01454 conserved hypothetical protein 



QRF01459 hypothetical protein 



ORF01460 homocysteine S-methyltransferase MmuM, putative 



ORF01463 hypothetical protein 



ORF01464 hypothetical protein 



ORF01465 hypothetical protein 



ORF01466 transcriptional regulator, TetR family 
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ORF01477 glutathione S-transferase family protein, putative 



ORF01478 conserved domain protein 



ORF01486 hypothetical protein 



ORF01488 R5 protein 



ORF01 489 transcriptional regulator, MarR family, putative 



ORF01494 membrane protein, putative 



ORF01497 acetyltransferase, GNAT family 



ORF01502 hypothetical protein 



ORFQ1503 conserved hypothetical protein 



ORF01508 surface antigen-related protein 



ORF01535 conserved hypothetical protein 



ORF01547 conserved hypothetical protein 



ORF01566 expressed cell wall surface anchor family protein 



ORF01572 glycosyltransferase, group 1 family protein 



ORF01573 glycosyltransferase, group 2 family protein 



ORF01575 membrane protein, putative 



ORF01576 glycosyltransferase, group 2 family protein 



ORF01577 glycosyltransferase, group 2 family protein 



QRF01S78 nucleotide sugar dehydratase, putative 



ORF01581 lipoprotein, putative 



ORF01582 conserved hypothetical protein 



ORF01596 ammonium transporter family protein 



ORF01S97 conserved hypothetical protein 



ORF01601 hypothetical protein 



ORF016Q8 proton/peptide symporter family protein 



ORF01611 hypothetical protein 



ORF01615 conserved domain protein 



ORF01638 conserved hypothetical protein 



ORF01641 conserved hypothetical protein 



ORF01645 cell wall surface anchor family protein 



ORF01660 membrane protein, putative 



ORF01661 ABC transporter, ATP binding protein 



ORFQ1666 hypothetical protein 



ORF01667 hypothetical protein 



ORF01670 hypothetical protein 
ORF01672 protease, putative, POINT MUTATION 



ORF01673 hypothetical protein 



ORF01674 hypothetical protein 



ORF01675 hypothetical protein 



ORF01680 tetracenomycin polyketide synthesis O-methy transferase TcmP, putative 



ORFQ1681 hypothetical protein 



ORF01682 hypothetical protein 



ORF01684 hypothetical protein 

ORF01692 peptide ABC transporter, ATP-binding protein 



QRF01695 peptide ABC transporter, permease protein 
ORF01696 peptide ABC transporter, peptide-binding protein 



ORF01699 transposase, IS3Q family, putative 



ORF01700 transporter, major facilitator family 



ORFQ17Q3 transcriptional regulator, LysR family 



QRF01715 conserved hypothetical protein 



ORF01719 hypothetical protein 
ORF01720 conserved hypothetical protein 



ORF01721 glyoxalase family protein 



ORF01727 conserved hypothetical protein 



ORF01729 acetyltransferase, GNAT family 



8 



S g nes n t shared with GAS or pneumococcus 



ORFxxxxx Annotation 

ORF01730 glycosyl transferase, group 2 family protein 

ORF01733 hypothetical protein 

ORF01734 conserved hypothetical protein 

ORFQ1735 hypothetical protein 

ORF01736 hypothetical protein __ 

ORF01737 hypothetical protein 

ORF01742 hypothetical protein _____ 

ORF01743 PTS system component, putative 

ORF01744 conserved hypothetical protein 

ORF01748 P-isomer specific 2-hydroxyacid dehydrogenase family protein 

ORF01753 conserved hypothetical protein 

ORF01754 hypothetical protein 

ORF01761 transposase, IS30 family, putative, truncation 

ORF01778 amino acid permease, putative 

ORF01807 hypothetical protein 

QRF01836 hypothetical protein 

ORF01838 hypothetical protein 

ORFQ1839 dihydroxy acetone kinase family protein 

ORF01840 transcriptional regulator, TetR family, putative 

ORF01842 hypothetical protein 

QRFQ1843 dihydroxyacetone kinase family protein 

ORFQ1844 dihydroxyacetone kinase family protein 

ORF01847 conserved hypothetical protein 

QRF01850 hypothetical protein 

ORF01863 pyruvate phosphate dikinase (ppdK) 

ORF01864 expressed protein of unknown function 

ORF01865 CBS domain protein 

ORFQ1866 3-hydroxyacyl-CoA dehydrogenase family protein, putative secreted protein 

ORF01892 hypothetical protein 

QRF01893 hypothetical protein 

ORF01894 conserved hypothetical protein 

QRF01895 hypothetical protein 

ORF01896 hypothetical protein 

ORF01897 hypothetical protein 

ORFQ1898 hypothetical protein 

ORF01899 hypothetical protein 

ORF01903 conserved hypothetical protein 

ORF019Q4 drug resistance transporter, EmrB/QacA family 

ORF01905 hypothetical protein 

ORF01922 conserved hypothetical protein 

• ORF01925 FMN-binding protein 

ORF01934 hypothetical protein 

ORF01936 polyprenyl synthetase family protein 

ORF01939 cytochrome d ubiquinol oxidase, subunit II (cydB) 

ORF01940 cytochrome d oxidase, subunit I (cydA) 

ORF01941 pyridine nucleotide-disulphide oxidoreductase family protein 

ORF01942 prenyltransferase, UbiA family 

ORF01943 hypothetical protein 

QRF01944 hypothetical protein 

ORF01946 cyctopropane-fatty-acyl-phospholipid synthase (cfa) 

ORF01951 conserved hypothetical protein 

QRF01953 hypothetical protein 

ORF019S4 conserved hypothetical protein 

ORF01984 hypothetical protein 

ORF01988 hypothetical protein 
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IORF01989 hypothetical protein 



IQRF01990 hypothetical protein 



| ORF01991hypothetical protein 

IORF02000 membrane protein, putative 



IORF02001 transposase, IS30 family, putative 



IORF02Q05 hypothetical protein 



ORFQ2006 xylulose-5-phosphate/fructose-6-phosphate phosphoketolase (xfp) 



IORF02009 conserved hypothetical protein 
|ORF02010 carbohydrate kinase, FGGY family 



IORF02011 hypothetical protein 



IORF02012 PTS system component putative 



IORF02015 glyoxylate reductase, NADH-dependent 



IORF02016 hypothetical protein 



1ORF02025 hypothetical protein 



IORF02026 hypothetical protein 



ORF02030 glutamate-cysteine ligase-related protein 



IORF02036 phosphinothricin N-acety)transferase (pat) 
IORF02039 conserved hypothetical protein 



IORF02044 conserved hypothetical protein 



1ORF02045 conserved hypothetical protein 



IORF02046 prophage LambdaSa2, lysin, putative 



I ORF02047 prophage LambdaSa2 t holin, putative 



IORF02048 conserved hypothetical protein 



1ORFQ2049 hypothetical protein 



IORF02050 conserved domain protein 



I ORF02051 prophage LambdaSa2, PbIB, putative 



I ORF02053 conserved hypothetical protein 



1ORF02056 conserved hypothetical protein 



IORF02057 hypotheticaiprotein 



I ORF02058 hypotheticaTprotein 



ORF02059 conserved hypothetical protein 



I ORF0206Q conserved hypothetical protein 



IORF02061 hypothetical protein 



IORF02062 hypothetical protein 



IORF02063 conserved domain protein 



1ORF02064 conserved domain protein 



1ORF02066 prophage LambdaSa2, protease, putative 



IORF02067 conserved hypothetical protein 



1ORF02068 prophage LambdaSa2, terminase large subunit, putative 



1ORF02069 hypothetical protein 



IORFQ2070 hypothetical protein 



ORF02071 prophage LambdaSa2, site-specific recombinase, phage integrase family 
ORF02072 conserved hypothetical protein 

ORF02Q73 prophage LambdaSa2, transcriptional regulator, Cro/CI family 
ORF02075 hypothetical protein 
IORF02077 hypothetical protein' 



IORF02078 conserved hypothetical protein 



IORF02079 conserved hypothetical protein 



I ORF02080 conserved hypothetical protein 



1ORF02081 hypothetical protein 



ORF02084 prophage LambdaSa2, bacteriophage replication protein/hypothetical protein, 
(truncation/fusion 



lORF02085hypothetical protein 



1ORF02087 hypothetical protein 



IORF02088 conserved hypotheticaiprotein 
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QRF02089 prophage LambdaSa2, HNH endonuclease family protein 

QRF0209Q prophage LambdaSa2, antirepressor protein, putative 

ORF02091 conserved domain protein 

ORF02092 hypothetical protein 

ORF02093 hypothetical protein 

ORF02094 hypothetical protein 

ORF02095 prophage LambdaSa2, repressor protein, putative 

ORF02097 hypothetical protein 

ORF02098 prophage LambdaSa2 t site-specific recombinase, phage integrase family 

QRF02100 hypothetical protein 

ORFQ2102 hypothetical protein 

ORF02103 microcin immunity protein MccF, putative 

ORF02105 oxidoreductase, Gfo/ldh/MocA family 

ORF02108 hypothetical protein 

ORF02109 Cyclic nucleotide-binding domain protein 

ORF02119 hypothetical protein . 

ORF02124 hypothetical protein 

ORF02125 nitroreductase family protein 

ORF02134 bacteriocin transport accessory protein, putative 

ORF02148 neuraminldase-related protein 

ORF02160 2 ,3'-cyc1ic-nudeotide 2-phosphodiesterase (cpdB) 

QRF02163 conserved hypothetical protein 

ORF02171 membrane protein, putative m 

ORF02172 hypothetical protein m 

ORF02173 membrane protein, putative 

ORF02175 conserved hypothetical protein, truncation 

ORF02181 phosphate transport system regulatory protein PhoU, putative 

ORF02187 hypothetical protein 

ORF02190 conserved hypothetical protein m 

ORF02191 hypothetical protein 

ORF02194 acetyltransferase, GNAT family 

ORF02196 hypothetical protein 

ORF02198 acetyltransferase, GNAT family 

QRF02201 membrane protein, putative 

ORF02203 hypothetical protein 

ORF02205 transcriptional regulator, Cro/Cl family 

ORF02206 conserved hypothetical protein 

QRF022Q7 conserved hypothetical protein TIGR00730 

ORF02208 hypothetical protein 

ORF02209 site-specific recombinase, phage integrase family 

ORF02210 conserved hypothetical protein 

QRF02211 conserved hypothetical protein 

ORF02212 hypothetical protein 

ORF02213 hypothetical protein 

ORF02214 transcriptional regulator, Cro/Cl family 

ORF02215 expressed protein of unknown function 

ORF02216 site-specific recombinase, phage integrase family 

QRF02217 conserved hypothetical protein 

ORF02219 hypothetical protein 

ORF02221 cell wall anchor protein-related protein 

ORF02223 hypothetical protein 

ORF02224 hypothetical protein 

ORF02225 hypothetical protein 

ORF02226 membrane protein, putative 

ORF02227 conjugal transfer protein, interruption-C 
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ORF02230 conserved hypothetical protein 

ORF02231 conserved hypothetical protein 

ORF02232 conserved hypothetical protein 

QRF02235 hypothetical protein 

ORF02236 conserved hypothetical protein 

ORF02237 hypothetical protein 

ORF02238 hypothetical protein 

ORF02239 hypothetical protein 

ORF02240 transcriptional regulator, Cro/CI family 

ORF02241 hypothetical protein 

ORF02242 transcriptional regulator, Cro/CI family 

ORF02243 FtsK/SpolllE family protein ' 

ORF02244 hypothetical protein 

ORF02245 hypothetical protein 

ORF02246 cell wall surface anchor family protein 

QRFQ2247 transposase, ISL3 family - 

ORF02250 mercuric resistance operon regulatory protein MerR (merR) 

ORF02251 Mn2+/Fe2+ transporter, NRAMP family 

ORF02252 membrane protein, putative 

ORF02253 ABC transporter, ATP-binding protein 

ORF02254 conserved hypothetical protein 

QRF02255 streptomycin resistance protein 

ORF02257 hypothetical protein 

ORF02258 hypothetical protein 

ORF02259 conserved hypothetical protein 

ORF02260 acetyltransferase, GNAT family . . 

ORF02261 membrane protein, putative 

ORF02263 hypothetical protein 

ORF02264 transcriptional regulator, Cro/CI family 

QRF02265 PAP2 family protein 

ORF02266 conserved hypothetical protein FRAMESHIFT 

ORF02267 conserved hypothetical protein TIGR00730 

ORF02268 protease, putative 

ORFQ2269 rhodanese family protein 

ORFQ2271 hypothetical protein 

ORF02274 conserved hypothetical protein 

QRF02275 5-methyltetrahydrofolate~homocysteine methyltransferase, putative 

ORF02277 conserved hypothetical protein 

QRF02279 hypothetical protein 

ORF02282 sensor histidine kinase 

ORF02283 chromosome assembly-related protein 

ORF02287 expressed protein of unknown function 

ORF02291 pathogenicity protein, putative * 

ORF02308 hydrolase, haioacid dehalogenase-like family 

ORFQ2314 conserved hypothetical protein 

ORF02317 hypothetical protein 

ORF02330 hypothetical protein 

ORF02344 site-specific recombinase, phage integrase family 

ORF02345 conserved hypothetical protein 

ORF02346 conserved hypothetical protein 

ORF02347 hypothetical protein 

ORF02349 conserved hypothetical protein 

ORF02350 hypothetical protein 

ORF02351 transcriptional regulator, Cro/CI family 

ORF02352 conserved domain protein 
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ORF02354 hypothetical protein 

QRF02356 expressed putative secreted protein 

ORF02362 sensor histidine kinase 

ORF02363 response regulator ; 

QRF02367 membrane protein, putative 

QRF02368 conserved hypothetical protein 

ORF02379 membrane protein, putative 

ORFQ2395 transcriptional regulator, Cro/CI family 

ORF02406 membrane protein, putative 

ORF02416 diacylglycerol kinase catalytic domain protein, putative 

ORFQ2418 hypothetical protein . 

ORF02422 hypothetical protein __ 

ORF02425 conserved hypothetical protein 

ORF03001 conserved hypothetical protein m 

ORF03004 conserved hypothetical protein 

ORF03005 cylX protein 

QRFQ3006 Tn916, hypothetical protein 

ORF03007 Tn916, hypothetical protein ( 

ORF03008 Tn916, hypothetical protein _ 

ORF03009 Tn916 t tetM leader peptide _ 

ORF03010 Tn916, hypothetical protein 

ORF03012 prophage LambdaSa2, HNH endonuclease family protein 
QRF03013 conserved hypothetical protein 

ORF03015 conjugal transfer protein, interruption-N 
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ORF0QQ35 membrane protein, putative 



ORF00087 lipoprotein, putative 



ORF00088 hypothetical protein 



ORF00089 hypothetical protein 



ORF00123 hypothetical protein 



ORF00138 hypothetical protein 



ORF00187 hypothetical protein 



ORF00188 hypothetical protein 



ORF00192 hypothetical protein 



QRF00205 hypothetical protein 



ORF00228 lipoprotein, putative 



ORF00234 hypothetical protein 



ORF00235 hypothetical protein 



ORF00238 hypothetical protein 



ORF00240 transcriptional regulator, Cro/CI family 



ORF00241 hypothetical protein 



ORF00242 conserved hypothetical protein 



ORF00243 hypothetical protein 



ORF00247 hypothetical protein 



ORF00249 hypothetical protein 



ORF00253 hypothetical protein 



ORFQ0254 hypothetical protein 



ORF00255 hypothetical protein 



ORFQ0256 hypothetical protein 



ORF00257 hypothetical protein 
ORF00258 hypothetical protein 



QRF00259 hypothetical protein 
ORF00260 hypothetical protein 



ORF00272 expressed putative lipoprotein 



ORF00273 hypothetical protein 
ORFQ0274 hypothetical protein 



QRFQ027S hypothetical protein 



ORF00276 hypothetical protein 
ORF00278 membrane protein, putative 



ORF00285 lipoprotein, putative 
ORF00292 hypothetical protein 



ORF00294 expressed protein of unknown function 



ORF00308 conserved hypothetical protein 



ORF00332 hypothetical protein 
ORF00340 hypothetical protein 



QRF00384 hypothetical protein 

ORF00402 membrane protein, putative 



ORF00408 hypothetical protein 
ORF00416 hypothetical protein 



ORF00417 hypothetical protein 



ORF00448 hypothetical protein 



ORF00476 hypothetical protein 



ORF00489 DNA-damage-inducible protein J, putative 



ORF00490 hypothetical protein 



ORF00491 lipoprotein, putativ 
ORF00497 conserved domain protein 



ORF00510 bacteriocin transport accessory protein,putative 
ORF00512 hypothetical protein 



ORF00527 hypothetical protein 
ORF00556 hypoth tical protein 
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QRF00575 hypothetical protein 

QRFQQ599 hypothetical protein 

ORF0Q618 hypothetical protein 

ORF00620 hypothetical protein 

ORF00623 hypothetical protein 

ORFQ0626 prophage LambdaSal, transcriptional regulator, Cro/CI family 

ORF00628 hypothetical protein 

ORF00630 hypothetical protein 

ORF00632 hypothetical protein 

ORF00635 hypothetical protein 

ORF00636 hypothetical protein 

ORF00637 hypothetical protein 

ORF00642 conserved hypothetical protein 

ORF00644 hypothetical protein 

ORF00645 hypothetical protein 

ORF00647 hypothetical protein 

ORF00649 hypothetical protein 

ORF00650 hypothetical protein 

ORF00653 conserved hypothetical protein 

ORF00657 conserved hypothetical protein, truncation 

ORF00661 conserved hypothetical protein 

ORF00673 hypothetical protein 

ORF00674 hypothetical protein 

ORF00675 conserved hypothetical protein 

ORF00676 conserved hypothetical protein 

ORF00682 hypothetical protein 

ORF00685 conserved hypothetical protein 

QRF00698 hypothetical protein 

ORF00712 hypothetical protein 

ORF00718 cell wall surface protein, interruption-N 

ORF00723 hypothetical protein 

ORF00735 expressed protein of unknown function 

ORF00737 conserved hypothetical protein, degenerate 

ORF00738 hypothetical protein 

QRF0074Q hypothetical protein 

ORF00741 hypothetical protein 

ORF00747 cylD protein (cylD) 

ORF00753 cylE protein (cylE) 

ORF00756 cylJ protein (cylJ) 

ORF00757 cylK protein (cylK) 

QRF00758 hypothetical protein 

QRF00759 putative secreted protein 

QRF00761 hypothetical protein 

ORF00796 hydrolase, haloacid dehalogenase-like family 

ORFQ0806 conserved hypothetical protein 

ORF00822 ABC transporter, ATP-binding protein 

ORF00827 hypothetical protein 

QRFQ0872 cell wall surface anchor family protein 

ORF00909 hypothetical protein 

ORF00923 hypothetical protein 

QRF00924 conserved hypothetical protein 

ORF00942 expressed putative secreted protein 

QRF0Q943 hypothetical protein 

ORF00944 hypothetical protein 

ORF01013 hypothetical protein 



Tabl 12: GBS ORF's n t shared with any published g nom 



ORFxxxxxAnn tation 



QRF01014 hypothetical protein 



ORF01015 hypothetical protein 



ORF01016 hypothetical protein 



ORF01018 hypothetical protein 



ORF01019 hypothetical protein 



QRF01021 hypothetical protein 



ORF01035Tn916 t excisionase 



ORF01062 hypothetical protein 



ORF01096 nisin-resistance protein, putative 



ORF01145 lipoprotein, putative 



ORF01146 conserved hypothetical protein, FRAMESHIFT 



ORF01148 lipoprotein, putative 



ORF01149 hypothetical protein 



ORF0115Q hypothetical protein 



ORF01151 hypothetical protein 



ORF01152 lipoprotein, putative 



ORF01153 hypothetical protein 



ORF01158 hypothetical protein 



ORF01159 hypothetical protein 



ORFQ1161 expressed conserved domain protein 



ORF01162 conserved hypothetical protein 



ORF01166 hypothetical protein 



ORF01168 conserved hypothetical protein 



ORF01169 hypothetical protein 



ORF01174 conserved domain protein 



ORF01175 hypothetical protein 



ORF01186 cell wall surface anchor family protein, putative 



ORF01187 hypothetical protein 



OR F0 1204 hypothetical protein 



ORF01215 hypothetical protein 



ORF01258 hypothetical protein 



ORF01262 conserved hypothetical protein, FRAMESHIFT 



ORF01263 hypothetical protein 



ORF01265 hypothetical protein 



ORF01266 hypothetical protein 



ORF013Q4 polysaccharide biosynthesis protein CpsK(V) (cpsK) 



QRF01308 polysaccharide biosynthesis protein CpsM(V) (cpsM) 



ORF01309 polysaccharide biosynthesis protein cpsH(V) (cpsH) 



QRF01349 hypothetical protein 



ORF01384 hypothetical protein 



ORFQ1385 hypothetical protein 



ORF01386 conserved hypothetical protein 



ORF01392 hypothetical protein 



ORF01395 conserved hypothetical protein 



ORF01409 conserved hypothetical protein 



ORF01410 hypothetical protein 



ORF01417 hypothetical protein 



OR F0 141 8 hypothetical protein 



ORF01420 hypothetical protein 



ORF01423 conserved hypothetical protein 



ORF01424 conserved hypothetical protein 



ORF01425 conserved hypothetical protein 



ORF01426 conserved hypothetical protein 



ORF01427 hypothetical protein 



ORF01431 hypothetical protein 
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6tl*H3£M^ D8260E 
Table 12: GBS ORF's not shared with any publish d gen me 

ORFxxxxxAnn tati n 

ORF01432 conserved domain protein 
QRFQ1434 hypothetical protein 

ORF01435 calcium-binding protein, putative 

ORF01437 abortive infection protein AbiGI (abiGl) 
ORF01438 abortive infection protein AbiGIl (abiGII) 

ORFQ1441 conserved hypothetical protein, degenerate 

ORF01443 hypothetical protein 

ORF01445 hypothetical protein 

QRF01452 hypothetical protein < ~ 

QRF01459 hypothetical protein 

QRF01463 hypothetical protein 

ORF01464 hypothetical protein 
ORF01465 hypothetical protein 

QRF01486 hypothetical protein 

ORF01488 R5 protein ~ 
ORFQ1575 membrane protein, putative 

ORF01581 lipoprotein, putative 

ORF01601 hypothetical protein 

ORF01611 hypothetical protein 

ORFQ1638 conserved hypothetical protein 

. ORF01645 cell wall surface anchor family protein 

QRF01660 membrane protein, putative 
ORF01666 hypothetical protein 

ORF01667 hypothetical protein 

ORF01670 hypothetical protein 

ORF01673 hypothetical protein . 

ORF01674 hypothetical protein 

ORF01 675 hypothetical protein ~" 

ORF01681 hypothetical protein 

ORF01682 hypothetical protein 

ORF01684 hypothetical protein 

ORF01719 hypothetical protein 

ORF01733 hypothetical protein / 

QRF01735 hypothetical protein ' 

ORF01736 hypothetical protein 

ORF01737 hypothetical protein 

QRF01742 hypothetical protein 

ORF01 754 hypothetical protein ~ 
ORFQ1761 transposase, IS30 family, putative, truncation 
ORF01807 hypothetical protein 

ORFQ1836 hypothetical protein 

ORF01838 hypothetical protein 

ORF01842 hypothetical protein __ 

ORF01850 hypothetical protein 

ORF01892 hypothetical protein 

ORF01893 hypothetical protein 

QRF01895 hypothetical protein "~ 

ORF01896 hypothetical protein 

ORF01897 hypothetical protein 

QRF01898 hypothetical protein _ 

ORF01899 hypothetical protein 

QRF019Q5 hypothetical protein 

ORF01934 hypothetical protein ___ 

ORF01943 hypothetical protein 

ORF01944 hypothetical protein 
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Tabl 12: GBS ORFs notshar d with any published genome 

ORFxxxxx Annotation a 

ORF01953 hypothetical protein 

ORF01984 hypothetical protein 

ORF01988 hypothetical protein 

ORF01989 hypothetical protein 

QRF02005 hypothetical protein 

ORF02011 hypothetical protein 

ORF02016 hypothetical protein 

ORFQ2025 hypothetical protein 

ORF02026 hypothetical protein 

ORF02045 conserved hypothetical protein 

ORF02047 prophage LambdaSa2, holjn, putative 

ORF02048 conserved hypothetical protein 

ORF02049 hypothetical protein 

ORF02050 conserved domain protein 

ORF02053 conserved hypothetical protein 

ORFQ2057 hypothetical protein 

ORF02058 hypothetical protein 

ORF02061 hypothetical protein 

ORF02062 hypothetical protein 

ORF02063 conserved domain protein 

QRF02067 conserved hypothetical protein 

ORF02069 hypothetical protein 

ORFQ2070 hypothetical protein 

ORF02072 conserved hypothetical protein 

ORF02Q73 prophage LambdaSa2, transcriptional regulator, Cro/Cl family 

ORF02075 hypothetical protein 

ORF02077 hypothetical protein 

ORF02Q78 conserved hypothetical protein 

ORFQ2081 hypothetical protein 

ORF02085 hypothetical protein 

QRF02087 hypothetical protein 

ORF02088 conserved hypothetical protein 

ORF02091 conserved domain protein 

ORF02Q92 hypothetical protein 

ORF02093 hypothetical protein 

ORF02094 hypothetical protein 

ORF02097 hypothetical protein 

QRF0210Q hypothetical protein 

ORF02102 hypothetical protein 

ORF02108 hypothetical protein 

ORF02119 hypothetical protein 

ORF02124 hypothetical protein 

ORF02171 membrane protein, putative 

QRF02172 hypothetical protein 

ORF02173 membrane protein, putative 

ORF02191 hypothetical protein 

ORF02196 hypothetical protein 

ORF02203 hypothetical protein 

ORF02208 hypothetical protein 

ORF02212 hypothetical protein 

ORF02213 hypothetical protein 

ORF02214 transcriptional regulator, Cro/Cl family 

ORF02215 expressed protein of unknown function 

ORF02217 conserved hypothetical protein 

ORF02219 hypothetical protein 
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ORFxxxxxAnn tatl n 

ORF02221 cell wall anchor protein-related protein 

QRF02223 hypothetical protein 

QRF02224 hypothetical protein 

ORF02225 hypothetical protein 

QRF02231 conserved hypothetical protein 

ORF02235 hypothetical protein 

ORF02236 conserved hypothetical protein 

QRF02237 hypothetical protein 

ORF02238 hypothetical protein 

ORF02239 hypothetical protein 

ORF02241 hypothetical protein 

ORF02244 hypothetical protein 

ORF02245 hypothetical protein 

ORF02263 hypothetical protein 

QRF02268 protease, putative 

ORF02271 hypothetical protein 

ORF02279 hypothetical protein 

ORF02283 chromosome assembly-related protein 

ORF02317 hypothetical protein 

ORF02330 hypothetical protein 

ORF02344 site-specific recombinase, phage integrase family 

ORF02345 conserved hypothetical protein 

ORF02347 hypothetical protein 

ORF02349 conserved hypothetical protein 

ORF02350 hypothetical protein 

ORF02351 transcriptional regulator, Cro/CI family . 

ORF02354 hypothetical protein 

QRF02356 expressed putative secreted protein 

ORF02395 transcriptional regulator, Cro/Ct family 

ORF02418 hypothetical protein 

ORF02422 hypothetical protein 

ORF02425 conserved hypothetical protein 

ORFQ30Q4 conserved hypothetical protein 

ORF03005 cylX protein 

ORF03006 Tn916, hypothetical protein 

ORF03007 Tn916 t hypothetical protein 

ORF03008 Tn916. hypothetical protein 

ORFQ3Q09 Tn916, tetM leader peptide 

QRF03010 Tn916, hypothetical protein 

ORF03015 conjugal transfer protein, interruption-N 
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Table 13: Comparative Sequences relating to SAG04^5 (thiolase) 
SEQ ID NO. 1301: SAG0466 FROM THE 2603V/R 6BS STRAIN 

CTCCTGCCCCTGCAATGGCAGTTAGACCCATAGGTTTATTTTTATATTTTAATGCCTGCATAAGATGAAGGATATTAATA 
ATTCCTGAGCAGGCATAAGGGTGTCCGTAAGCTAATGTCCCTCCAAAAATATTGAATTTTTCTCTCTCTTCAGGATAATA 
ATGATTAAATAGAGCATCAATCGCTGCAAATGGTTCATTCCATTCAATTGCATCATAATCCGATATTTTAGTATGAGTTT 
CTGTTAATAGTTTTTCCGTAGCCGTGTGAACCAATTCTGGACTAAGCTTGGGATCTCCTGCTACTTCTACAATGTGAACA 
ATCCGGAATTCTGTTTTCTGACTCTGAAGCGTTAGAAATGCAGCAGCATCGTGCATTAAACAAACATTTCCAATAGTGAG 
CAAAGGTGAATTTTCCATCAATCTTGGTAATTTTTGAAAAAATGTTtCTTTTaGTTTTCTAACGCCTTGATCTCGCATCC 
CTTCCATTGGTAAGATTACyTCTTCTAAATAGCCACCTTGTTTAGCTGTTAAGGCGCGTTTATGGCTCAAGAATGCCAAT 
TTATCTAACATTTCTCTTCTAAAaCCATATTTTTGACAGACTCTCTGGGCCCCTTCTAACATTACAGTTTCAGCATAAGA 
GTCAGGAGAAAACTGAGCAACTGTATATTCTCCGTTACGATTATCTTCTTTAGCATAACGTCTCATAGGTTGAAGAGAAC 
TACTTTCAATCCCCCCAACAAGAACTTTTTCATTAATACCGGTACTGATTTTTAGATAACCAAAAAACAAGGCAGAACTT 
GATGAAGCACACTGCATATCAATCGTTTGTACTGGAATATAGGATT^TAATCAGAAAAAAGAGTCATCAAACGACCT^T 
AT TGCCCCCAGTACCAACTGTGTTCCCACAAATAATACTATCAATGT T AGAT TCTGATTCTATTTTTT T T ATTTGATT T A 
AAAGGTGTGCTCCTAAAAGTTCTGGACGGTAAGTTTAAATTGCTT 

SEQ ID NO. 1302: SAG0466 FROM THE M732 GBS TYPE III STRAIN 

TCGGTATAAAAGGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTCTTAAATCAAATAAAAAAAATA 
GAATCAGAATCTAATATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCT 
TTTTTCTGATTATGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTG 
GTTATCTAAAAATCAGTGCCGGTATTAATGA7VAAAGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGA 
CGTTACGCTAAAGAAGATAATCGTAACGGAGAATATACCGTTGCTCAGTTTTCTCCTGACTCTTATGCTGAAACTGTAAT 
GTTAGAAGGGGCACAAAGAGTCTGTCAAAAATATGGTTTTAGAAGAGAAATGTTAGATAAATTGGCATTCTTGAGCCATA 
AACGCGCCTTAACAGCTAAACAAGGTGGCTATTTAGAAGAGGTAATCTTACCAATGGAAGGGATGCGAGATCAAGGCGTT 
AGAAAACTAAAAGAAGCATTTTTTCAAAAATTACCAAGATTGATGGAAAATTCACCTTTGCTCACTATTGGAAATGTTTG 
TTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAAAACAGAATTCCGGATTGTTCACATTGTAGAAGTAG 
CAGGAGATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTATTAACAGAAACTCATACTAAAATATCG 
GATTATG ATGCAATTGAATGGAAT GAACCATTTGCAGCGATTGATGCT TTATT T AATCAT TATTATCCTGAAGAGAGAGA 
AAAATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTCAGGAATTA 

SEQ ID NO. 1303: SAG0466 FROM THE 090 GBS TYPE la STRAIN 

TTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTATGAATCCTATATTCCAG 
TACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTGGTTATCTAAAAATCAGTGCCGGTATTAAT 
GAAAAAGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTACGCTAAAGAAGATAATCGTAACGG 
AGAATATACCGTTGCTCAGTTTTCTCCTGACTCTTAkGCTGAAACTGTAATGtTAGAAGGGGCACAAAGAGTCTGTCAAA 
AATATGGTTTtAGAAGAGAAATGTTAGATAAATTGGCATTCTTGAGCCATAAACGCGCCTTAACAGCTAAACAAGGTGGC 
TATTTAGAAGAGGTAATCTTACCAATGGAAGGGATGCGAGATCAAGGCGTTAGAAAACTAAAAGAAGCATTTTTTCAAAA 
ATTACCAAGATTGATGGrAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTwA 
CGCTTCAGAGTCAGAAAACAGAATTCCGGATTGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTG 
GTTCACACGGCTACGGAAAT^CTATTAACAGAAACTCATACTAAAATATCGGATTATGATGCAATTGT^ATGGAATGAACC 
ATTTGCAGCGATTGATGCTTTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGGCATTAG 
CTTACGGACACCCTTATGCCTGCTCAGG 

SEQ ID NO. 1304: SAG0466 FROM THE COH1 GBS TYPE la STRAIN 

ATCGGTATAAAAGGGAAGCAATTTAAAATTACCGTCCAGAACTTTTAGGAGCACACCTCTTAAATCAAATAAAAAAAATA 
GAATCAGAATCTAATATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCT 

TTTTTCTGATTATGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTG 
GGTATCTAAAAA • 

SEQ ID NO. 1305 : SAG0466 FROM THE CJB GBS NONTYPEABLE STRAIN REVERSE 
COMPLEMENT 

TTTTCAAAAATTACCAAGATTGATGGAAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTG 
CATTTCTAACGCTTCAGAGTCAGAAAACAG7^ATTCCGGATTGTTCACATTGTAGJ\AGTAGCAGGAGATCCCAAGCTTAGT 
CCAGAATTGGTTCACACGGCTACGGAAAAACTATTAACAGAAACTCATACTAAAATATCGGATTATGATGCAATTGAATG 
GAATGAACCATTTGCAGCGATTGATGCTTTATTTAATCATTATTATCCTG7UVGAGAGAGAAAAATTCAATATTTTTGGAG 
GGGCATTAGCTTACGGACACCCTTAATGCCTGCTCAGGAATTATTAATATCC 

SEQ ID NO. 1306: sag0466 FROM THE CJB110 GBS NONTYPEABIiE STRAIN 

GGTATAAAAGGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTCTTAAATCAAATAAAAAAAATATA 
ACCAGAATCTAACATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTT 
TTTCTGATTATGAATCCTATATTC 
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Table 13: Comparative Sequences relating to SAGO 4 




(thiolase) 



SEQ ID NO. 1307: SAG0466 FROM THE 1169NT1 GBS TYPE V STRAIN REVERSE COMPLEMENT 

CAAGATTGATGGAAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTT 

CAGAGTCAGAAAACAGAATTCCGGATTGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTGGTTCA 

CACGGCTACGGAAAAACTATTAACAGAAACTCATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTG 

CAGCGATTGATGCTTTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGGCATTAGCTTAC 

GGACACCCTTATGCCTGCTCAGGAATTATTAATATCCTTCATCTTATGCAGGCATTAAAATATAAAAATAAACCTATGGG 

CCTAACTGCCATTGCAGGGGCA 

SEQ ID NO. 1308: SAG0466 FROM THE 18RS21 GBS TYPE II STRAIN 

CCTTAACAGTTAAACTUVGGTGGCTATTTAGAAGAGGTAATCTTACCAATGGAAGGGATGCGAGATCAAGGCGTTAGAAAA 
CTAAAAGAAACATTTTTTCAAAAATTACCAAGATTGATGGAAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAAT 
GCACGATGCTGCTGCATTTCTT^ACGCTTCAGAGTCAGAAAACAGAATTCCGGATTGTTCACATTGTAGAAGTAGCAGGAG 
ATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTATTAACAGAAACTCATACTAAAAT-ATCGGATTAT 
GATGCAATTGAATGGAATGAACCATTTGCAGCGATTGATGCTCTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATT 
CAATATTTTTGGAGGGACATTAGCTTACGGACACCCTTATGCCTGCTCAGGAATTATTAATATCCTTCATCTTATGCAGG 
" CATTAAAATATAAA7VATAAACCTATGGGTCTAACTGCCATTGCAGGGGCAG 

SEQ ID NO. 1309: SAG0466 FROM THE 18RS21 GBS TYPE II STRAIN 

TCGGTATAAAAGGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTTTTAAATCAAATAAAAA7^AATA 
GAATCAGAATCTAACATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCT 
TTTTTCTGATTATGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTG 
GTTATCTAAAAATCAGTACCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGA 
CGTTATGCTAAAGAAGATAATCGTAACGGAGAATATACAGTTGCTCAGTTTTCTCCTGACTCTTATGCTGAAACTGTAAT 
GTTAGAAGGGGCCCAGAGAGTCTGTCAAAAATATGGTTTTAGAAGAGAAATGTTAGATAAATTGGCATTCTTGAGCCATA 
AACGCGCCTTAACAGCTAAACA 

SEQ ID NO. 1310: SAG0466 FROM THE H36b GBS TYPE lb STRAIN 

TTTGGGCTACGAACACCTATCGGTATAAAAGGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTTTT 
AAATCAAATAAAAAAAATAGAATCAGAATCTAACATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATA 
TTGGTCGTTTGATGACTCTTTTTTCTGATTATGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCA 
AGTTCTGCCTTGTTTTTTGGTTATCTAAAAATCAGTACCGGTATTAATGA7UWVGTTCTTGTTGGGGGGATTGAAAGTAG 
TTCTCTTCAACCTAT GAGACGTTATGCTAAAGAAGATAATCGTAACGGAGAATAT ACAGTTGCTCAGTTT T CTCCTGACT 
CTTATGCTGAAACTGTAATGTTAGAAGGGGCCC 

SEQ ID NO. 1311: SAG0466 FROM THE H36b GBS TYPE lb STRAIN REVERSE COMPLEMENT 

GAAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAA 

AACAGAATTCCGGATTGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGG 

AAAT^ACTATTAACAGAAACTCATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGAT 

GCTCTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGACATTAGCTTACGGACACCCTTA 

TGCCTGCTCAGGAATTATTAATATCCTTCATCTTATGCAGGCATTAAAATATAAAAATAAACCTATGGGTCTAACTGCCA 

TTGCAGGGGCAGGA 

SPQ ID NO. 1312: SAG0466 FROM THE M781 GBS TYPE III STRAIN REVERSER COMPLEMENT 

CCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAAAACAGAATT 

CCGGATTGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGJ\ATTGGTTCACACGGCTACGGAAAAACTAT 

TAACAGAAACTCATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGATGCTTTATTT 

AATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTC 

AGGAAT TATTAATATCCT TC ATCTT ATGCAGGCATT AAAATATAAAAAT AAACCTATGGGTTCTAACTGC 

SEQ ID NO. 1313: SAG0466 FROM THE M781 GBS TYPE III STRAIN 

GCT^TTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTCTTAAATCAAATAAAAAAAATAGAATCAGAATCTAATA 
TTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTATGAA 
TCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTGGTTATCTAAAAATCAG 
TGCCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTACGCTAAAGAAG 
ATAATCGTAACGGAGAATATACCGTTGCTCAGTTTTCTCCTGACTCTTATGCTGAAACTGTAATGTTAGA 

SEQ ID NO 1314: SAG0466 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

CCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAAAACAGAATT 

CCGGATTGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAATVACTAT 

TAACAGAAACTCATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGATGCTCTATTT 

AATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGACATTAGCTTACGGACACCCTTATGCCTGCTC 

AGGAATTATTAATATCCTTCATCTTATGCAGGCATTAAAATATAAAAATAAACCTATGGGTCTAACTGCCATTGCAGGGG 

C 
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Table 13: Comparative Sequences relating to SAG046^^thiolase) 

SEQ ID NO. 1315: SAG0466 FROM THE JM9130013 GBS TYPE VIII STRAIN REVERSE 
COMPLEMENT 

GCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAAAACAGAATTCCGGA 
TTGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTATTAACA 
GAAACTCATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGATGCTCTATTTAATCA 
T T ATTATCCTGAAGAGAGAGAAAAAT TCAAT ATTT TTGGAGGGGCATTAGCT TACGG ACACCCTTATGCCTGCTCAGGAA 
TTATTAATATCCTTCATCTTATGCAGGCATT7VAAATATAAAAATAAACCTATGGGTCTAACTGCCATTGCAGGGGCAGGA 

SEQ ID NO. 1316: SAG0466 FROM THE JM9130013 GBS TYPE VIII STRAIN 

TTTGGGCTACGAACACCTATCGGTATAAAAGGGAAGCAATTTA7^ACATTACCGTCCAGAACTTTTAGGAGCACACCTTTT 
AAATCAAATAAAAAAAATAGAATCAGAATCTAACATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATA 
TTGGTCGTTTGATGACTCTTTTTTCTGATTATGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCA 
AGTTCTGCCTTGTTTTTTGGTTATCTAAAAATCAGTACCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAAAGTAG 
TTCTCTTCAACCTATGAGACGTTATGCTAAAGAAGATAATCGTAACGGAGAATATA 

SEQ1301 CTCCTGCCCCTGCAATGGCAGTTAGACCCATAGGTTTATTTTTATATTTTA 

SEQ1302 

SEQ1303 

SEQ1304 

SEQ1305 

SEQ1306 

SEQ1307 

SEQ1308 CTTAACAGTTAAACAAGGTGGCTATTTAGAAGAGGTAATCTTACCAATGGAAGGGATGC 

SEQ1309 

SEQ1310 

SEQ1311 

SEQ1312 

SEQ1313 

SEQ1314 

SEQ1315 

SEQ1316 



SEQ1301 TGCCTGCATAAGATGAAGGATATTAATAATTCCTGAGCAGGCATAAGGGTGTCCGTAAG 

SEQ1302 TCGGTATAAA 

SEQ1303 

SEQ1304 ATCGGTATAAA 

SEQ1305 TTTTCAAAAATTACCAAGATTGATGG 

SEQ1306 GGTATAAA 

SEQ1307 CAAGATTGATGG 

SEQ1308 AGATCAAGGCGTTAGAAAACTAAAAGAAACAT T TTTTCAAAAATTACCAAGATTGATGG 

SEQ1309 TCGGTATAAA 

SEQ1310 TTTGGGCTACGAACACCTATCGGTATAAA 

SEQ1311 G 

SEQ1312 

SEQ1313 

SEQ1314 

, SEQ1315 

• SEQ1316 TTTGGGCTACGAACACCTATCGGTATAAA 

SEQ1301 TAATGTCCCTCCAAA-AATATTG7VATTTTTCTCTCTC-TTCAGGATAATAATGATTAAA 

SEQ1302 GGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTCTTAAATCAAAT 

SEQ1303 

SEQ1304 GGGAAGCAAT T T AAA- ATTACCGTCCAGAACT TTT AGGAGCACACCTCTTAAATCAAAT 

SEQ1305 AAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTC 

SEQ1306 GGGAAGCAATTTAAACATTACCGT CCAGAACTT TT AGGAGCACACCTCTTAAATCAAAT 

SEQ1307 AAATTCACCTTTGCTCACTATTGGA7UVTGTTTGTTTAATGCACGATGCTGCTGCATTTC 

SEQ1308 AAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTC 

SEQ1309 GGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTTTTAAATCAAAT 

SEQ1310 GGGAAGCAATTT AAACAT T ACCGT CCAGAACTTTT AGGAGCACACCTTTTAAATCAAAT 

SEQ1311 AAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTC 

SEQ1312 CCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTC 

SEQ1313 GCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTCTTAAATCAAAT 

SEQ1314 CCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTC 

SEQ1315 GCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTC 
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Table 13: 



Comparative Sequences relating to SAG0466 (tni lase) 



SEQ1316 

SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 

SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 

SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 



GGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTTTTAAATCAAAT 

AGAGCATCAATCGCTGCAAATGGTTCATTCC-ATTCAATTGCATCATAATCCGATATTT 

AAAAAAATAGAATCAGAATCTAAT — ATT GATAGTATTATTTGTGGGAACA-CAGT 

TTGTGGGAACA-CAGT 

AAAAAAATAGAATCAGAATCTAAT — ATT GATAGTATTATTTGTGGGAACA-CAGT 

AACGCTTCAGAGTCAGAAAACAGA — ATTCCGGATTGTTCACATTGTAGAAGTAGCAGG 

AAAAAAATATAACCAGAATCTAAC — ATT GATAGTATTATTTGTGGGAACA-CAGT 

AACGCTTCAGAGTCAGAAAACAGA — ATTCCGGATTGTTCACATTGTAGAAGTAGCAGG 
AACGCTTCAGAGTCAGAAAACAGA — AT TCCGGAT TGTTCACATTGTAGAAGT AGCAGG 

AAAAAAATAGAATCAGAATCTAAC — ATT GATAGTATTATTTGTGGGAACA-CAGT 

AAAAAAATAGAATCAGAATCTAAC — ATT GATAGTATTATTTGTGGGAACA-CAGT 

AACGCTTCAGAGTCAGAAAACAGA — ATTCCGGATTGTTCACATTGTAGAAGTAGCAGG 
AACGCTTCAGAGTCAGAAAACAGA — ATTCCGGATTGTTCACATTGTAGAAGTAGCAGG 

AAAAAAATAGAATCAGAATCTAAT — ATT GATAGTATTATTTGTGGGAACA-CAGT 

AACGCTTCAGAGTCAGAAAACAGA — ATTCCGGATTGTTCACATTGTAGAAGTAGCAGG 
AACGCTTCAGAGTCAGAAAACAGA — ATTCCGGATTGTTCACATTGTAGAAGTAGCAGG 
AAAAAAATAGAATCAGAATCTAAC — ATT GATAGTATTATTTGTGGGAACA-CAGT 

AGTATGAGTTTCTGTTAATAGTTTTTCCGTAGCCGTGTGAACCAATTCTGGACTAAGCT 
GGTACTGGGGGCAATATTGG-TCGTTTGATGACTCTTTTTTCTGATTATGAATCCTA — 
GGTACTGGGGGCAATATTGG-TCGTTTGATGACTCTTTTTTCTGATTATGAATCCTA — 
GGTACTGGGGGCAATATTGG-TCGTTTGATGACTCTTTTTTCTGATTATGAATCCTA — 
GATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTATTAACAGAAAC 
GGTACTGGGGGCAATATTGG-TCGTTTGATGACTCTTTTTTCTGATTATGAATCCTA — 
GATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTATTAACAGAAAC 
GATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTATTAACAGAAAC 
GGTACTGGGGGCAATATTGG-TCGTTTGATGACTCTTTTTTCTGATTATGAATCCTA — 
GGTACTGGGGGCAATATTGG-TCGTTTGATGACTCTTTTTTCTGATTATGAATCCTA — 
GATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTATTAACAGAAAC 
GATCCCAAGCTTAGTCCAGAAT TGGTTCAC ACGGCTACGGAAAAACTATTAACAGAAAC 
GGTACTGGGGGCAATATTGG-TCGTTTGATGACTCTTTTTTCTGATTATGAATCCTA — 
GATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTATTAACAGAAAC 
GATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTATTAACAGAAAC 
GGTACTGGGGGCAATATTGG-TCGTTTGATGACTCTTTTTTCTGATTATGAATCCTA — 

GGGATCTCCTGCTACTTCTACAATGTGAACAATCCGGA-ATTCTGTTTTCTGACTCTGA 
TATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGT — TCTGCCTTGTTTTT 
TATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGT — TCTGCCTTGTTTTT 
T AT TCCAGTACAAACGAT TGAT ATGCAGTGTGCTTCATCAAGT — TCTGCCTTGTTTTT 
CATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGA 

TATTC 

CATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGA 
CATACT AAAATATCGGATTATGATGCAATTGAATGGAATGAACCATT T GC AGCGATTGA 
TATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGT — TCTGCCTTGTTTTT 
TATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGT — TCTGCCTTGTTTTT 
CATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGA 
CATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGA 
TATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGT — TCTGCCTTGTTTTT 
CATACT AAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGA 
CATACT AAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGA 
TATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGT — TCTGCCTTGTTTTT 
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Table 13: Comparative Sequences relating to SA60466 (thlolase) 



SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 

SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 

SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 



GCGTTAGAAATGCAGCAGCATCGTGCATTAAACAAACATTTC — CAATAGTGAGCAAAG 
GGT-TATCTAAAAATCAGTG-CCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAA 
GGT-TATCTAAAAATCAGTG-CCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAA 

GGG-TATCTAAAAA 

GCTT TATTT AATCATTATTATCCTGAAG AGAGAGAAAAATTCAAT AT TTTTGGAGGGGC 



GCTTTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGGC 
GCTCTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGAC 
GGT-TATCTAAAAATCAGTA-CCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAA 
GGT-TATCTAAAAATCAGTA-CCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAA 
GCTCTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGAC 
GCTTTATTT AATCATTATTATCCTGAAG AGAGAGAAAAATTCAAT ATTTTTGGAGGGGC 
GGT-TATCTAAAAATCAGTG-CCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAA 
GCTCTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGAC 
GCTCTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGGC 
GGT-TATCTAAAAATCAGTA-CCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAA 

TGAATTTTCCATCAATCTTGG — TAATTTTTGAAAAAATGTTTCTTTTAGTTTTCTAAC 
GTAGTTCTCT TCAACCTATGAGACGT TACGCTAAAGAAGATAATCGTAACGGAGAAT AT 
GTAGTTCTCTTCAACCTATGAGACGTTACGCTAAAGAAGATAATCGTAACGGAGAATAT 



TTAGCTTACGGACACCCTTAA — TGCCTGCTCAGGAATTATTAATATCO 



TTAGCTTACGGACACCCTTA TGCCTGCTCAGGAATTATTAATATCCTTCATCTTAT 

TTAGCT TACGGACACCCTTA TGCCTGCTCAGGAATTATTAATATCCTTCATCTTAT 

GTAGTTCTCTTCAACCTATGAGACGTTATGCTAAAGAAGATAATCGTAACGGAGAATAT 
GTAGTTCTCTTCAACCTATGAGACGTTATGCTAAAGAAGATAATCGTAACGGAGAATAT 

TTAGCTTACGGACACCCTTA TGCCTGCTCAGGAATTATTAATATCCTTCATCTTAT 

TTAGCTTACGGACACCCTTA TGCCTGCTCAGGAATTATTAATATCCTTCATCTTAT 

GTAGTTCTCTTCAACCTATGAGACGTTACGCTAAAGAAGATAATCGTAACGGAGAATAT 

TTAGCTTACGGACACCCTTA TGCCTGCTCAGGAATTATTAATATCCTTCATCTTAT 

TTAGCTTACGGACACCCTTA TGCCTGCTCAGGAATTATTAATATCCTTCATCTTAT 

GTAGTTCTCTTCAACCTATGAGACGTTATGCTAAAGAAGATAATCGTAACGGAGAATAT 

CCTTGATCTCGCATCCCTTCCATTGGTAAGATTACYTCTTCTAAATAGCCACCTTGTTT 
CCGTTGCTCAGTTTTCTCCTGACTCTTATGCTG — AAACTGTAATGTTAGAAGGGGCAC 
CCGTTGCTCAGTTTTCTCCTGACTCTTAKGCTG — AAACTGTAATGTTAGAAGGGGCAC 



CAGGCATTAAAATATAAAAATAAACCTATGGGC-CTAACTGCCATTGCAGGGGCA 

CAGGCATTAAAATATAAAAATAAACCTATGGGT - CTAACTGCCATTGCAGGGGCAG 

CAGTTGCTCAGTTTTCTCCTGACTCTTATGCTG — AAACTGTAATGTTAGAAGGGGCCC 
CAGTTGCTCAGTTTTCTCCTGACTCTTATGCTG — AAACTGTAATGTTAGAAGGGGCCC 
CAGGCATTAAAATATAAAAATAAACCTATGGGT-CTAACTGCCATTGCAGGGGCAGGA- 

CAGGCATTAAAATATAAAAATAAACCTATGGGTTCTAACTGC 

CCGTTGCTCAGTTTTCTCCTGACTCTTATGCTG — AAACTGTAATGTTAGA 

CAGGCATTAAAATATAAAAATAAACCTATGGGT— CTAACTGCCATTGCAGGGGC 

CAGGCATTAAAATATAAAAATAAACCTATGGGT— CTAACTGCCATTGCAGGGGCAGGA- 
T ABCMARAT VS TNC S RATNGTS AGT HAS 
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Table 13: Comparative Sequences relating to SAG 04 6 

) 



46~(t 



(thlolase) 



SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 

SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 

SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 



GCTGTTAAGGCGCGTTTATGGCTCAAGAATGCCAATTTATCTAACATTTCTCTTCTAAA 
AAGAGTCTGTCAAAAATATGGTTTTAGAAGAGAAATGTTAGATAAATTGGCATTCTTGA 
AAGAGTCTGTC7\AAAATATGGTTTTAGAAGAGAAATGTTAGATAAATTGGCATTCTTGA 



GAGAGTCTGTCAAAAATATGGTTTTAGAAGAGAAATGTTAGATAAATTGGCATTCTTGA 



CCATATTTT TGACAGACTCTCTGGGCCCCTT — CTAACATTACAGTTTCAGCATAAGAG 
CCATAAACGCGCCTTAACAGCTAAACAAGGTGGCTATTTAGAAGAGGTAATCTTACCAA 
CCATAAACGCGCCTTAACAGCTAAACAAGGTGGCTATTTAGAAGAGGTAATCTTACCAA 



CCATAAACGCGCCTTAACAGCTAAACA- 



CAGGAGAAAACTGAGCAACTGTATATTCTCCGTTACGATTATCTTCTTTAGCATAACGT 
GGAAGGGATGCGAGATCAAGGCGTTAGAAAACTAT^AAGAAGCATTTTTTCAAAAATTAC 
GGAAGGGATGCGAGATCAAGGCGTTAGAAAACTAAAAGAAGCATTTTTTCAAAAATTAC 



6 



Table 13: Comparative Sequences relating to SAG0466 (thiolase) 



SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 

SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 

SEQ1301 
SEQ1302 
SEQX303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ13X0 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 



TCATAGGTTGAAGAGAACTACTTTCAATCCCCCCAACAAGAACTTTTTCATTAATACCG 
AAGATTGATGGAAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATG 
AAGATTGATGGRAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATG 



TACTGATTTTTAGATAACCAAAAAAC — AAGGCAGAACTTGATGAAGCACACTGCATAT 
TGCTGCATTTCTAACGCTTCAGAGTCAGAAAACAGAATTCCGGATTGTTCACATTGTAG 
TGCTGCATTTCTWACGCTTCAGAGTCAGAAAACAGAATTCCGGATTGTTCACATTGTAG 



AATCGTTTGTACTGGAATATAGGATTCATAATCAGAAAAAAGAGTCATCAAACGACCAA 
AGT AGCAGG AG ATCCCAAGCTTAGTCCAGAAT T GGTTC ACACGGCTACGGAAAAACTAT 
AGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTAT 



7 



Table 13: 



Comparative Sequences relating to SAGO 4 6 



% 

5466 i 



(thiolase) 



SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
. SEQ1315 
SEQ1316 

SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 

SEQ1301 
SEQ1302 
SEQ1303 
SEQ1304 
SEQ1305 
SEQ1306 
SEQ1307 
SEQ1308 
SEQ1309 
SEQ1310 
SEQ1311 
SEQ1312 
SEQ1313 
SEQ1314 
SEQ1315 
SEQ1316 



ATTGCCCCCAGTACCAACTGTGTTCCCACAAATAATACTATCAATGTTAGATTCTGATT 
AACAGAAACTCATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTG 
AACAGAAACTCATACTAAAATATCGGATXATGATGCAATTGAATGGAATGAACCATTTG 

TATTTTTTTTATTTGATTTAAAAGGTGTGCTCCTAAAAGTTCTGGACGGTAAGTTTAAA 
AGCGATTGATGCTTTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTT 
AGCGATTGATGCTT TATTTAATCATTATTATCCTGAAGAGAGAGAAAAAT TCAATAT TT 

TGCTT 

TGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTCAGGAATTA 
TGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTCAGG 
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Table 14: C mparative Sequences relating to SAG0471 (glucokinase) 



SEQ ID HO. 1401: SAG0471 FROM THE 18RS21 GBS TYPE IX STRAIN 
TTAAATTTGGTATCTTGACGCTTGAGGGAGAAGTACAAGAAAAATGGGCAATTGAGACC^^ 

TCTGATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGG 
AGCTGTTGATAGAACTAGTAAAACAGTAACAGGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCAGTTATTGAAAAAG 
AAGTTGGAATTCCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGCCAATAATCCCGAC 
GTTGTTTTCGTAACCCTCGGAACAGGAGTAGGTGGAGGTGTTATCGCAGATGGTAACCTCATCCATGGTGTTGCAGGAGCAGGTGGAGA 
AATTGGGCATATGATTGTTGATCC^GAAAATGGATTTACGTGCACATGTGGTAACAAAGGCTGCCTTGAGACAGTTGCATCAGCGACAG 
GTGTTGTTAGAGTAGC2VCGTCAACTCGCAGAACAATATGAGGGTTCGTCTGCO 

AGTAAAGATATTTTTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGTTACCTTGGACTGGCAGC 
AGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGTCGCGTTG 
AGAAATACTTTGTCACATTTGCTTTCCCACAAGTTAAAAAGTCAACTAAAATTAAGAT 



SEQ ID NO. 1402: SAG0471 FROM THE 090 GBS TYPE la STRAIN 

CGTTTCTGATATCGTTGAATCTCTCAAAC^TCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTC 
C^GGAGCTGTTGATAGAACTAGTAAAAC^GTAACAGGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAQTAGGTTCGGTTATTGAA 
AAAGAAGTTGGAATTCCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGA^ 

CGATGTTGTTTTCGTAACCCTCGGAACAGGAGTAGGTGGAGGTGTTATCGCAGATGGTAACCTCATCCATGGTGTTGCAGGAGCAGGTG 

GAGAAATTGGGCATATGATTGTTGATCCAGAKAATGGATTTACGTGCAC^TGTGGTAACAAAGGCTGTCTTGAGACAGTTGCAT 

ACAGGTGTTGTTAGAGTAG»03TCAACTCGCAGAAC^TATGAAGGTTCGTCTGCCATTAAAGCAGCGATTGACAACGGTGATACTGT 

TACAAGTAAAGATATTTTTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGTTACCTTGGACTGG 

CAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGTCGC 

GTTGAGAAATACTTTGTCACATTTG 

SEQ ID NO. 1403: SAG0471 FROM THE COHl GBS TYPE la STRAIN 

ACAAGAAAAATGGGCAATTGAGACCAATACTTTAGAAAACGGAAGACA 

GCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGGAGCTGTTGATAGAACTAGTAAAACAGTAACAGGT 
GCTTTTAATCTAAATTGGGCTGATACTCAAGA 

SEQ ID NO. 1404: SAG0471 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

TTGGTATCTTGACGCTTGAGGAGAAGTACAAGAAAAATGGGCAATTGAGACC^TACTTTAGAAAACGGAAGACATATCGTTTCTGATA 
TCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGGTCTCCAGGAGCTGTT 
GATAGAACTAGTAAAAC 

SEQ ID NO. 1405: SAG0471 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

CACCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGTCGC 
GTTGAGAAATACTTTGTCACATTTGCTTTCCCACAAGTTAAAAAGTCAACTA 

SEQ ZD NO. 1406: SAG0471 FROM THE 2603V/R GBS TYPE V STRAIN 

GGGCAATTGAGACO^TACTTTAGAAAACGGAAGACATATCGTTTCTGATATC 
tTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGGAGCTG 

SEQ ID NO, 1407: SAG0471 FROM THE H36b GBS TYPE lb STRAIN 

GGCAATTGAGACCAATACTTTAGAAAACGGAAGACATATCGTTTCTGATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGAT 
TAACAAAAGATGACTTTCTCX3GTATCGGTATGGGTTCTCCAGGAGCTGTTGATAGAA 

AATTGGGCTGATACTCAAGAAGTAGGTTCAGTTATTGAAAAAGAAGTTGGAATTCCATTTTTTATTGATAACGATGCTAATGTTGCA 
ACTTGGTGAACGCTGGGTAGGTGCTGGTGCCAATAATCCCGACGTTGTTTTCGTAACC 

SEQ ID NO. 1408: SAG0471 FROM THE H36 GBS TYPE lb STRAIN REVERSE COMPLEMENT 
GAGACAGTTGCATC^GCGACAGGTGTTGTTAGAGTAGCACGTCAACT 

TGACAACGGTGATACTGTTACAAGTAAAGATATTTTTATAGCAGC^GAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAAC 

CACGTTACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGT 

GAATTTTTACGTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACA 

SEQ ID NO. 1409: SAG0471 FROM THE M732 GBS TYPE III STRAIN 
ACAAGAAAAATGGGCAATTGAGACCATACTTAGAAAACGGAAGAC^ 

CTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGGAGCTGTTGATAGAACTAGTAAAACAGTAACAGGTGC 

TTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCGGTTATTGAAAAAGAAGT^ 

ATGTTGC^GCACTTGGTGAACGCTGGGTAGGTGCTGGTGCCAATAATC 

GGTGTTATCGCAGATGGTAACCTCATCCATGGTGTTGCAAGAGCAGGTGGAGAAATTGGGCATATGATT 

SEQ ID NO. 1410: SAG0471 FROM THE M732 GBS TYPE III STRAIN REVERSE COMPLEMENT 

CAGCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACAAGTTAAAAAGTCA 

ATTGCTGAACTAGGTAATGAT 

SEQ ID NO. 1411: SAG0471 FROM THE M781 GBS TYPE III STRAIN 

AGAAGTACAAGAAAATGGGCAATTGAGACCATACTTAGAAAACGGAAGAC^ 

TGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGGAGCTGTTGATAGAACTAGTAAAACAGTAACA 

GGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCGGTTATTGAAAAAGAAGTTGGAATTCCATTTTTO 

TGCTAATGTTGC^GCACTTGGTGAACGCTGGGTAGGTGCTGGTGCCAATAA^ 
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Table 14: Comparative Sequences relating to SAG0471 (glucokinase) 



SEQ ID NO. 1412: SAG0471 FROM THE M781 GBS TYPE III STRAIN REVERSE COMPLEMENT 
GATACTGTTACAAGTAAAGATATTTTTATAGCAGCAGAAG^^ 

TGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGAAT 
GTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACAAGTTAAAAA 

SEQ ID NO. 1413: SAG0471 FROM THE 090 GBS TYPE la STRAIN 

AAATTTGGTATCTTGACGCTTGAGGGAGAAGTACAAGT^AAAATGGGCATTGAGACCATACTTAGAAAACGGAAGACATATCGTTTCTGA 
TATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGGAGCTG 
TTGATAGAACTAGTAAAACAGTAAC^GGTGCTTTTAATCTAAATTGGGCTGAT^ 

GGAATTCCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGCCAATAATCCCGACG 
TTTCGTAACCCTCGGAACAGGAGTAGGTGGAGG 

SEQ ID NO. 1414: SAG0471 FROM THE 090 GBS TYPE la STRAIN REVERSE COMPLEMENT 

GTGATACTGTTACAAGTAAAGATATTTTTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGTTAC 
CTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATO 

ACXSTAGTCGCGTTGAGAAATACTTTATCACATTTGCTTTCCCACAAGTTAAAAAGTCAACTAT^AATTAAGATTG 

SEQ ID NO. 1415: SAG0471 FROM THE JM9130013 GBS TYPE VIII STRAIN REVERSE COMPLEMENT 

GTTATCGCAGATGGTAACCTCATCCATGGTGTTGCAGGAGCAGGTGGAGAAATTGGGCATATGATTGTTGATCCAGAAAATGGATTTAC 
GTGCAC^TGTGGTAACAAAGGCTGCCTTGAGACAGTTGCATCAGCGACAGGTGTTGTTAGAGTAGCACGTCAACTCGCAGAACAATATG 
AGGGTTCGTCTGCCATTAAAGCAGCGATTGACCACGGTGATACTGTTACAAGTAAAGATATTTTTATAGCAGCAGAAGATGGGGATAAA 
TTTGCTAATTCTGTTGTTGAACGTGTATCACGTTACCTTGGACTGGCAGC^GCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGT 
TATTGGTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACAAGTTAAAA 
AGTCAACTAA 

SEQ ID NO. 1416: SAG0471 FROM THE JM9130013 GBS TYPE VIII STRAIN REVERSE COMPLEMENT 

TGGTATCTTGACGCTTGAGGGAGAAGTACAAGAAAAATGGGCAATOGAGACGATACTTAGA 
GTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTT^ 

TAGAACTAGTAAAACAGTCACAGGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCAGTTATTGAAAAAGAAGCTGGAA 
TTCCATTTTTTATTG 

SEQ ID NO. 1417: SAG0471 FROM THE 2603V/R TYPE V GBS STRAIN REVERSE COMPLEMENT 

AGC^GCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAG^ 
TTGAGAAATACTTTGTCACATTTGTTTTCCCACAAGGT 

SEQ1401_ 

SEQ1402 

SEQ1403 

SEQ1404 

SEQ1405 

SEQ1406 

SEQ1407 

SEQ1408 

SEQ1409 

SEQ1410 

SEQ1411 , 

SEQ1412 

SEQ1413 

SEQ1414 

SEQ1415 TTATCGCAGATGGTAACCTCATCCATGGTGTTGCAGGAGCAGGTGGAGAAATTGGGCAT 

SEQ1416 

SEQ1417 



SEQ1401_ 

SEQ1402 

SEQ1403 

SEQ1404 

SEQ1405 

SEQ1406 

SEQ1407 

SEQ1408 GAG 

SEQ1409 

SEQ1410 

SEQ1411 

SEQ1412 

SEQ1413 

SEQ1414 

SEQ1415 TGATTGTTGATCCAGAAAATGGATTTACGTGCACATGTGGTAACAAAGGCTGCCTTGAG 

SEQ1416 

S3Q1417 
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Tabl 14: Comparative Sequences relating t SAG0471 (guicokinase) 



SEQ1401 

SEQ1402*" 

SEQ1403 

SEQ1404 

SEQ1405 

SEQ1406 

SEQ14 07 

SEQ1408 

SEQ1409 

SEQ1410 

SEQ1411 

SEQ1412 

SEQ1413 

SEQ1414 

SEQ1415 

SEQ1416 

SEQ1417 

SEQ1401 
SEQ1402' 
SEQ1403 
SEQ1404 
SEQ1405 
SEQ1406 
SEQ1407 
SEQ1408 
SEQ1409 
SEQ1410 
SEQ1411 
SEQ1412 
SEQ1413 
SEQ1414 
SEQ1415 
SEQ1416 
SEQ1417 

SEQ1401 
SEQ1402* 
SEQ1403 
SEQ1404 
SEQ1405 
SEQ1406 
SEQ1407 
SEQ1408 
SEQ1409 
SEQ1410 
SEQ1411 
SEQ1412 
SEQ1413 
SEQ1414 
SEQ1415 
SEQ1416 
SEQ1417 



CAGTTGCATCAGCGACAGGTGTTGTTAGAGTAGCACGTCAACTCGCAGAACAATATGAG 



CAGTTGCATCAGCGACAGGTGTTGTTAGAGTAGCACGTCAACTCGCAGAACAATATGAG 



GTTCGTCTGCCATTAAAGCAGCGATTGACAACGGTGATACTGTTACAAGTAAAGATATT 



• -GATACTGTTAC AAGT AAAGATATT 



_= GTGATACTGTTACAAGTAAAGATATT 

GTTCGTCTGCCATTAAAGCAGCGATTGACCACGGTGATACTGTTACAAGTAAAGATATT 



•TTAAATTTGGTATCTTGACGCTTGAGGGAGAAGTACAA 



ACAA 

TTGGTATCTTGACGCTTGAGG-AGAAGTACAA 



TTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGT 
ACAA 



AGAAGTACAA 

TTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGT 

AAATTTGGTATCTTGACGCTTGAGGGAGAAGTACAA 

TTATAGCAGCAGT^AGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGT 
TTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGT 
TGGTATCTTGACGCTTGAGGGAGAAGTACAA 
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Table 14: C mparative Sequences relating t SAG0471 (glue kinase) 



SEQ1401_ AAAAATGGGCAATTGAGACCAATACTTTAGAAAACGGAAGACATATCGTTTCTGATATC 

SEQ1402 CGTTTCTGATATC 

SEQ1403 AAAAATGGGCAATTGAGACCAATACTTTAGAAAACGGAAGACATATCGTTTCTGATATC 

SEQ1404 AT^AAATGGGCAATTGAGACCAATACTTTAGAAAACGGAAGACATATCGTTTCTGATATC 

SEQ1405 CACCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATT 

SEQ1406 GGGCAATTGAGACCAATACTTTAGAAAACGGAAGACATATCGTTTCTGATATC 

SEQ1407 GGCAATTGAGACCAATACTTTAGAAAACGG7UVGACATATCGTTTCTGATATC 

SEQ1408 ACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATT 

SEQ140 9 AAAAATGGGCAATTGAGACCA-TACTT-AGAAAACGGAAGACATATCGTTTCTGATATC 

SEQ1410 

SEQ1411 AAAA-TGGGCAATTGAGACCA-TACTT-AGAAAACGGAAGACATATCGTTTCTGATATC 

SEQ1412 ACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATT 

SEQ1413 AAAAATGGGCA-TTGAGACCA-TACTT-AGAAAACGGAAGACATATCGTTTCTGATATC 

SEQ1414 ACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATT 

SEQ1415 ACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATT 

SEQ1416 AAAAATGGGCAATTGAGACCA-TACTT-AGAAAACGGAAGACATATCGTTTCTGATATC 

SEQ14 17 AGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATT 

SEQ1401_ TTGAATCTCTCA-AACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGG 

SEQ1402~ TTGAATCTCTCA-AACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGG 

SEQ1403 TTGAATCTCTCA-AACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGG 

SEQ1404 TTGAATCTCTCA-AACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGG 

SEQ1405 GTGGCGGTGTCTCAGCAGCAGGTG7VATTTTTACGTAGTCGCGTTGAGAAATACTTT 

SEQ1406 TTGAATCTCTCA-AACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGG 

SEQ1407 TTGAATCTCTCA-AACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGG 

SEQ1408 GTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTT 

SEQ1409 TTGAATCTCTCA-AACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGG 

SEQ1410 CAGCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTT 

SEQ1411 TTGAATCTCTCA— AACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGG 

SEQ1412 GTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTT 

SEQ1413 TTGAATCTCTCA-AACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGG 

SEQ1414 GTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTT 

SEQ1415 GTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTT 

SSQ1416 TTGAATCTCTCA-AACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGG 

SEQ1417 GTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGTVAATACTTT 

SBQ1401_ ATCGGTATGGGTTCTCC^GGAGCTGTTGATAGAACTAGTAAAACAGTAACAGGTGCTTT 

SEQ1402 ATCGGTATGGGTTCTCCAGGAGCTGTTGATAGAACTAGTAAAACAGTAACAGGTGCTTT 

SEQ14 03 ATCGGTATGGGTTCTCCAGGAGCTGTTGAT AGAACTAGTAAAACAGTAACAGGTGCTTT 

SEQ1404 ATCGGTATGGGGTCTCCAGGAGCTGTTGATAGAACTAGTAAAAC 

SEQ1405 GTCACATTTGCTTTCCCACAAGTTAAAAAGTCAACTA 

SEQ1406 ATCGGTATGGGTTCTCCAGGAGCTG 

SBQ1407 ATCGGTATGGGTTCTCCAGGAGCTGTTGATAGAACTAGTAAAACAGTAACAGGTGCTTT 

SEQ1408 GTCACATTTGCTTTCCCACA 

SEQ1409 ATCGGTATGGGTTCTCCAGGAGCTGTTGATAGAACTAGTAAAACAGTAACAGGTGCTTT 

SEQ1410 GTCACATTTGCTTTCCCACAAGTTAAAAAGTCAACTAAAATTAAGATTGCTGAACTAGG 

SEQ1411 ATCGGTATGGGTTCTCCAGGAGCTGTTGATAGAACTAGTAAAACAGTAACAGGTGCTTT 

SEQ1412 GTCACATTTGCTTTCCCACAAGTTAAAAA 

SEQ1413 ATCGGTATGGGTTCTCCAGGAGCTGTTGATAGAACTAGTAAAACAGTAACAGGTGCTTT 

SEQ14 14 ATCACATTTGCTTTCCCACAAGTT AAAAAGTCAACTAAAATTAAGATTG 

SEQ14 15 GTCACATTTGCTTTCCCACAAGTTAAAAAGTCAACTAA 

SEQ1416 ATCGGTATGGGTTCTCCAGGAGCTGTTGATAGAACTAGTAAAACAGTCACAGGTGCTTT 

SEQ1417 GTCACATTTGTTTTCCCACAAGGT 
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Table 14: Comparative Sequences relating to SAG0471 (glucokinase) 



SEQX40X_ 

SEQ1402" 

SEQ1403 

SEQX404 

SEQX405 

SEQ1406 

SEQ1407 

SEQ1408 

SEQ1409 

SEQ1410 

SEQ14X1 

SEQX4X2 

SEQ14X3 

SEQ1414 

SEQ1415 

SEQX4X6 

SEQ1417 

SEQ1401 

SEQ1402" 

SEQ1403 

SEQ1404 

SEQ1405 

SEQ1406 

SEQX407 

SEQ1408 

SEQX409 

SEQ1410 

SEQ1411 

SEQX4X2 

SEQX4X3 

SEQX4X4 

SEQX4X5 

SEQX416 

SEQX4X7 

SEQX40X 

SEQX402~ 

SEQX403 

SEQX404 

SEQ1405 

SEQX406 

SEQX407 

SEQX408 

SEQ1409 

SEQX4X0 

SEQ14XX 

SEQX4X2 

SBQX4X3 

SEQX4X4 

SEQ14X5 

SEQX4X6 

SEQ14X7 



AATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCAGTTATTGAAAAAGAAGTTGGAAT 

AATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCGGTTATTGAAAAAGAAGTTGGAAT 
AATCTAAAT TGGGCTGAT ACTCAAGA 



AATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCAGTTATTGAAAAAGAAGTTGGAAT 

AATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCGGTTATTGAAAAAGAAGTTGGAAT 
AATGAT 

AATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCGGTTATTGAAAAAGAAGTTGGAAT 
AATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCAGTTATTGAAAAAGAAGTTGGAAT 



AATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCAGTTATTGAAAAAGAAGCTGGAAT 



CCATTTTTTATTGATT^ACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGC 
CCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGC 



CCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGC 
CCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGC 



CCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGC 



CCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGC 



CCATTTTTTATTG- 



GGTGCCAATAATCCCGACGTTGTTTTCGTAACCCTCGGAACAGGAGTAGGTGGAGGTGT 
GGTGCCAATAATCCCGATGTTGTTTTCGTAACCCTCGGAACAGGAGTAGGTGGAGGTGT 



GGTGCCAATAATCCCGACGTTGTTTTCGTAACC— 



GGTGCCAATAATCCCGATGTTGTTTTCGTAACCCTCGGAACAGGAGTAGGTGGAGGTGT 

GGTGCCAATAATCCCGATGTTGTTTTCGTAACCCTCGGAACAGGAGTA 

GGTGCCAATAATCCCGACGTTGTTTTCGTAACCCTCGGAACAGGAGTAGGTGGAGG 
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Table 14: C Inparative Sequences relating to SAG0471 (glucokinase) 



SEQ1401_ 

SEQ1402 

SEQ1403 

SEQ1404 

SEQ1405 

SEQ1406 

SEQ1407 

SEQ1408 

SEQ1409 

SEQ1410 

SEQ1411 

SEQ1412 

SEQ1413 

SEQ1414 

SEQ1415 

SEQ1416 

SEQ1417 

SEQ1401_ 

SEQ1402 

SEQ1403 

SEQ1404 

SEQ1405 

SEQ1406 

SEQ1407 

SEQ1408 

SEQ1409 

SEQ1410 

SEQ1411 

SEQ1412 

SEQ1413 

SEQ1414 

SEQ1415 

SEQ1416 

SEQ1417 

SEQ1401 
SEQ1402~ 
SEQ1403 
SEQ1404 
SEQ1405 
SEQ1406 
SEQ1407 
SEQ1408 
SEQ1409 
SEQ1410 
SEQ1411 
SEQ1412 
' SEQ1413 
SEQ1414 
SEQ1415 
SEQ1416 
SEQ1417 



ATCGCAGATGGTAACCTCATCCATGGTGTTGCAGGAGCAGGTGGAGAAATTGGGCATAT 
ATCGCAGATGGTAACCTCATCCATGGTGTTGCAGGAGCAGGTGGAGAAATTGGGCATAT 



ATCGCAGATGGTAACCTCATCCATGGTGTTGCMGAGCAGGTGGAGAAATTGGGCATAT 



ATTGTTGATCCAGAAAATGGATTTACGTGCACATGTGGTAACAAAGGCTGCCTTGAGAC 
ATTGTTGATCCAGAKAATGGATTTACGTGCACATGTGGTAACAAAGGCTGTCTTGAGAC 



ATT- 



GTTGCATCAGCGACAGGTGTTGTTAGAGTAGCACGTCAACTCGCAGAACAATATGAGGG 
GTTGCATCAGCGACAGGTGTTGTTAGAGTAGCACGTCAACTCGCAGAACAATATGAAGG 
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Table 14: Comparative Sequences relating t SAG0471 (glucokinase) 



SEQX40X_ 

SEQX402 

SEQ1403 

SEQ1404 

SEQ1405 

SEQ1406 

SEQ1407 

SEQX408 

SEQ1409 

SEQ1410 

SEQ1411 

SEQ1412 

SEQ1413 

SEQ1414 

SEQX4X5 

SEQ1416 

SEQ1417 

SEQX40X 

SEQX402~ 

SBQ1403 

SEQX404 

SEQ1405 

SEQ1406 

SEQ1407 

SEQ1408 

SEQ1409 

SEQ1410 

SEQX4XX 

SEQX4X2 

SEQX413 

SEQ1414 

SEQ1415 

SEQ1416 

SEQ1417 

SEQ1401 

SEQ1402" 

SEQ1403 

SEQX404 

SEQ1405 

SEQ1406 

SEQ1407 

SEQ1408 

SEQX409 

SEQX4XO 

SEQX4X1 

SEQX4X2 

SEQX4X3 

SEQX4X4 

SEQX4X5 

SEQX4X6 

SEQX4X7 



TCGTCTGCCATTAAAGCAGCGATTGACACCGGTGATACTGTTACAAGTAAAGATATTTT 
TCGTCTGCCATTAAAGCAGCGATTGACAACGGTGATACTGTTACAAGTAAAGATATTTT 



ATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGTTA 
ATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGTTA 



CTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGG 
CTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGG 
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Table 14: C mparative Sequences relating to SAG0471 (^Tucokinase) 



col 



Q8260! 



SEQ1401_ 

SEQ1402 

SEQ1403 

SEQ1404 

SEQ1405 

SEQ1406 

SEQ1407 

SEQ1408 

SEQ1409 

SEQ1410 

SEQ1411 

SEQ1412 

SEQ1413 

SEQ1414 

SEQ1415 

SEQ1416 

SEQX417 

SEQ1401_ 

SEQ1402 

SEQ1403 

SEQ1404 

SEQ1405 

SEQ1406 

SEQ1407 

SEQ1408 

SEQ1409 

SEQ1410 

SEQ1411 

SEQ1412 

SEQ1413 

SEQX414 

SEQ14X5 

SEQ1416 

SEQ1417 



GGCGGTGTCTCAGCAGCA6GTGAATTTTTAC6TAGTCGCGTTGAGAAATACTTTGTCAC 
GGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTTGTCAC 



TTTGCTTTCCCACAAGTTAAAAAGTCAACTAAAATTAAGAT 
TTTG 
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Table 15: Comparative Sequences relating to SAG0492 



SEQ ID NO. 1501: SAG0492 FROM THE 1169NT1 GBS NONTYPEABLE STRAIN 

TGACTTGGATATTCATCy\AGGAGAAGTGGTGGTTATTATTGGCCCTTC 

TGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGAATTGATATAACAGACAAAAAAAATGATATTTTTAAAATGCGC 
GGCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTATTAAGACAAAGGGA 
TAAGCTTGATGCTCAGACAAAAGCATAGGAGCTACTTGAAAAA^ 
GAGGACAACAACAACGGATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATG 

CCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTT 
TGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCAGGCATTATTGTGAGCAAGGGACCCCTAAGGAAGTAT 

SEQ ID NO. 1502: SAGO 4 92 FROM THE 18RS21 GBS TYPE II STRAIN 

TTGGGAAAAATGAGGTTTTAAAAGGCATTGACTTGGATATTCATC^^GGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCT 
TCAACATTTTTAAGAACAATGAATCTCTTGGAAGTACCAACAA 

TGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACT^ 

TATCACCTATTAAGACAAAGGGGCTTTCTAATCTTGATGCTCAGACAAAAGCATATGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAG 
GCTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAACAACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTCATGTCCTTCT 
TTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATG 

TGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGACGCAGAAATTAT 

SEQ ID NO. 1503: SAG0492 FROM THE 2603V/R GBS TYPE V STRAIN REVERSE COMPLEMENT 

AAAAATGAGGTTTTAAAAGGCATTGACTTGGATATTC^TCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAAC 

ATTTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGGATTGATATAACAGACAAAAAGAATGATA 

TTTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAA 

CCTATTAAGACAAAGGGGCTTTCTAATCTTGATGCTCAGACAAAAGCATATGAGCTACTTGAAAAAGTTGG 

TACTTATCCAGCTAGCTTATCTGGAGGACAACAACAACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTG 
ATGAACCTACTTGAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGAT^ 

ATTGTCACTC^TGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCAGGAATTATTGTTGAGCAAGGGGCCC 

SEQ ID NO. 1504: SAGO 4 92 FROM THE M781 GBS TYPE III STRAIN REVERSE COMPLEMENT 

GAGGTTTTAAAAGGC^TTGACTTGGATATTCATCAAGGAGAAGTGGTGGTTATTA 

AAGAACAAT GAAT CT CTTGGAAGTAC(^^C AAAGGGAACAGTGACTTTTGAAGGGATTGATATAAC^GACAAAAAG AATGATATTTTTA 

AAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCT 

AAGACAAAGGGACTTTCTAAGCTTGATGCTCAGACAAAAGCATACGAGCTACTTGAAA 

TCCAGCAAGCTTATCTGGAGGACAACAACAACGGATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAAC 

CTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTGGTATGACGATGGTTATTGTC 

ACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTGATTTTTATGGATGCAGOT 

AGTAT 

SEQ ID NO. 1505: SAG0492 FROM THE 090 GBS TYPE la STRAIN 

TGGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATTT^ 

ACTTTTGAAGGGATTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTTCAATCT 
ATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTATTAAGACAA^ 

ACGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAGCTAGCTTATCTGGAGGGCAACAAC^^ 
GCAAGAGGTCTT6CAATGAATCCTGATGTCCTTCTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAG 
TGTTATGCAAGATTTAGCTAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGT^ 
TTTTTATGGATGCAGGCATTATTGTTgAsCAAGGGACCCCTAAGGAAGTA 

SEQ ID NO. 1506: SAG0492 FROM THE A909 GBS TYPE la STRAIN 

CAATACAAGGACTTCATAAAAGTTTTGGGAAAAATGAGGTTTTAAAAGGCAT 

ATTGGCCCTTCTGGCTCTGGTAAGTCAACATTTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAGTGACTO 
GATTGATATAAC^GACAAAAAGAATGATATTTTTAAAATGCGCGAAA 

TGACTGTACTAGAAAATATTACTTTATCACCTATTAAGACAAAGGGGCTTTCTAAGCTTGATGCTC^GACAAAAGCATATGAGCTACTT 
GAAAAAGTTGGACTCAAAGAGAAGGCTAATACTTATCCAGCTAGCTTATCT^ 
TGCAATGAATCCTGATGTCCTTCTTTTTGATGAACCTACTTCAGCTCTO 
ATTTAGCTAAATCTGGTATGACGATGGTTATTGTCACTCATGA 

GC^GGAATTATTGTgAGCAAGGGGCCCCTAAGGAAGTATTTGAGCAGACAAAAGAAATCCGCACA 

SEQ ID NO. 1507: SAGO 4 92 FROM THE CJB110 GBS NONTYPEABLE STRAIN REVERSE COMPLEMENT 
GACTTGGATATTCATCAAGGAGAAGTGGTGGTTATTATTGGCCCTTCTGGCTCTGG 

GGAAGTACCAAO\AAGGGAACAGTGA(nTTTGAAGGGATTGATATAACAGACAAAAAGAATGATATTTTTAA 
GCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCT 
AAGCTTGATGCTCAGACAAAAGCATACGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGGCT 
AGGACAACAACAACGAATTGCTATTGCAAGAGGTCTTGCA 

CTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTGGTATC 
GCACGTGAAGTAGCGGATCGTGTCTTTTTATGGATGCGGGAATTATTGTGAGCAAGGGACC 
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Table 15: Comparative Sequences relating to 




SEQ ID NO. 1508: SAGO 4 92 FROM THE H36b GBS TYPE lb STRAIN 

ATGAGGTTTTAAAAGGCATT6ACTTGGATATTCATCAAX3GAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATTT 

TTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGGATTGATATAACAGACAAAAAGAATGATATTTT 

TAAAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTA 

TTAAGACAAAGGGGCTTTCTAAGCTTGATGCTCAGACAAAAGCATATGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGGCTAATACT 

TATCCAGCTAGCTTATCTGGAGGACAACAACAACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGA 

ACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTC^ 

TC^CTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCASGAATTATTGTTGAGCAAGGGGCC^ 
GAAGTAT 

SEQ ID NO. 1509: SA60492 FROM THE JM9130013 GBS TYPE VIII STRAIN REVERSE COMPLEMENT 

GXaTTTTAAAAGGCATTGACTTGGATATTCATCAAGGAGAAGTAGTGGTTATTATTGGCTC 
GAACAATGAATCTCTTGGAAGTACCAACAAAGGX3AACAGTGACTTTTGAAGGGATTGATATA 

ATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTATT 
GACAAAGGGGCTTTCTAAGCTTGATGCTC^GACAAAAGCATATGAGCT 
CAGCTAGCTTATCTGGAGGAC^CAACAACGAATTGCTATTGCAA 
ACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATT^ 

TCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCAGGAATTATTGTTGAGCAAGGGGCCCCTAAGGAAG 
TATTTAGCAAAACAAAAGAAAT 

SEQ ID NO. 1510: SAG0492 FROM THE M732 GBS TYPE III STRAIN 
GGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAAC 

CTTTTGAAGGGATTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTTCAATCTA 

TTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTATTAAGACAAAGGGAC^ 

CGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAGCAAGCTTATCTGG 

SEQ ID NO. 1511: SAG0492 FROM THE COHl GBS TYPE la STRAIN 

ATTGACTTGGATATTCATCAAGGAGAAGTGGTGGTTATTATTGGCCCTTCTGGCTCT 

CTTGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGGATTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCG 

TGGGCATGGTTOTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTA 

TCTAAGCTTGATGCTCAGAC7\AAAGCATACGAGCTACTTGAAAAAGTTGGACTCAAAGA 

TGG 
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Table 15: Comparative Sequences relating t SAG0492 

SEQ1501 TGACTTGG 

SEQ1502 TTGGGAAAAATGAGGTTTTAAAAGGCATTGACTTGG 

SEQ1503 AAAAATGAGGTTTTAAAAGGC AT TGACTTGG 

SEQ1504 GAGGTTTTAAAAGGC AT TGACTTGG 

SEQ1505 

SEQ1506 AAT ACAAGGACTTCATAAAAGTTTTGGGAAAAATGAGGT T TTAAAAGGCATTGACT TGG 

SEQ1507 GACTTGG 

SEQ1508 ATGAGGTTTTAAAAGGCATTGACTTGG 

SEQ1509 GGTTTTAAAAGGCAT TGACTTGG 

SEQ1510 

SEQ1511 ATTGACTTGG 

SEQ1501 TATTCATCAAGGAGAAGTGGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACAT 

SEQ1502 TATTCATCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACAT 

SEQ1503 TATTCATCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACAT 

SEQ1504 TATTCATCAAGGAGAAGTGGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACAT 

SEQ1505 TGGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACAT 

SEQ1506 TATTCATCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACAT 

SEQ1507 TATTCATCAAGGAGAAGTGGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACAT 

SEQ1508 TATTCATCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACAT 

SEQ1509 TATTCATCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACAT 

SEQ1510 GGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACAT 

SEQ1511 TATTCATCAAGGAGAAGTGGTGGTTATTATTGGCCCTTCTGGCTCTGG^AAGTCAACAT 

SEQ1501 TTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGAA 

SEQ1502 TTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGGA 

SEQ1503 TTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGGA 

SEQ1504 TTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGGA 

SEQ1505 TTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGGA 

SEQ150 6 TT TTAAGAACAATGAATCTCTTGGAAGTACCAAGAAAGGGAACAGTGACTT T TGAAGGGA 

SEQ1507 TTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAG T GACTTTTGAAGGGA 

SEQ1508 TTTTAAGAACAATGAATCTCT TGGAAGT ACCAACAAAGGGAACAGTG ACTTTTGAAGG GA 

SEQ150 9 TTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGCJAACAGTGACTTTTGAAGGGA 

SEQ1510 TTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGGA 

SEQX511 TTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGGA 

SEQ1501 TTGAT ATAACAGACAAAAAAAATGATATTT T TAAAATGCGCGAAAAAATGGGCATGGTT T 

SEQ1502 TTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTT 

SEQ1503 TTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTT 

SEQ1504 TTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTT 

SEQ1505 T TGATAT AACAGACAAAAAGAATGATATTTTT AAAATGCGCGAAAAAATGGGCATGGT TT 

SEQ150 6 TTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTT 

SEQ1507 TTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTT 

SEQ1508 T TGATATAACAGACAAAAAGAATGAT ATT TT TAAAATGCGCG AAAAAATGGGCATGGTTT 

SEQ150 9 TTGATATAACAGACAAAAAGAATGATAT T TTTAAAATGCGCGAAAAAATGGGCATGGTTT 

SEQ1510 T TGATATAACAGACAAAAAGAATGATATTT TTAAAATGC GCGAAAAAATGGGCATGGT T T 

SEQ1511 TTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTT 

SEQ1501 TTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTA 

SEQ1502 TTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTA 

SEQ1503 TTCAAC AGTTCAATCTATTTCCCAATATGACTGTACTAGAAAAT AT TACTT TATCACCTA 

SEQ15 0 4 TTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTA 

SEQ1505 TTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTA 

SEQ1 506 TTCAACAGTTCAATCTATTTCCCAAT ATGACTGT ACTAGAAAATATTACTTTATC ACCTA 

SEQ1507 TTCAACAGTTCAATCTATTTCCCAAT ATGACTGT ACTAGAAAATATTACTTTATCACCTA 

SEQ1508 TTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTA 

SEQ1509 TTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTA 

SEQ1510 TTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTA 

SEQ1511 TTCAACAGTTCAATCTATTTCCCAAT ATGAC TGTACTAGAAAATAT TACTTT ATCACCT A 
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SEQ1501 
SEQ1502 
SEQ1503 
SEQ1504 
SEQ1505 
SEQ1506 
SEQ1507 
SEQ1508 
SEQ1509 
SEQ1510 
SEQ1511 

SEQ1501 
SEQ1502 
SEQ1503 
SEQ1504 
SEQ1505 
5EQ1506 
SEQ1507 
SEQ1508 
SEQ1509 
SEQ1510 
SEQ1511 

SEQ1501 
SEQ1502 
SEQ1503 
SEQ1504 
SEQ1505 
SEQ1506 
SEQ1507 
SEQ1508 
SEQ1509 
SEQ1510 
SEQ1511 

SEQ1501 
SEQ1502 
SEQ1503 
SEQ1504 
SEQ1505 
SEQ1506 
SEQ1507 
SEQ1508 
SEQ1509 
SEQ1510 
SEQ1511 

SEQ1501 
SEQ1502 
SEQ1503 
SEQ1504 
SEQ1505 
SEQ1506 
SEQ1507 
SEQ1508 
SEQ1509 
SEQ1510 
SEQ1511 



Table 15: Comparative Sequences relating to SAG0492 



TTAAGACAAAGGGACTTTCTAAGCTTGATGCTCAGACAAAAGCATACGAGCTACTTGAAA 
TTAAGACAAAGGGGCTTTCTAATCTTGATGCTCAGACAAAAGCATATGAGCTACTTGAAA 
TTAAGACAAAGGGGCTTTCTAATCTTGATGCTCAGACAAAAGCATATGAGCTACTTGAAA 
TTAAGACAAAGGGACTTTCTAAGCTTGATGCTCAGACAAAAGCATACGAGCTACTTGAAA 
TTAAGACAAAGGGACTTTCTAAGCTTGATGCTCAGACAAAAGCATACGAGCTACTTGAAA 
T TAAGACAAAGGGGCTTTCT AAGCTT GATGCTCAGACAAAAGCATATGAGCTACT TGAAA 
TTAAGACAAAGGGACT T TCT AAGCTTGATGCTCAGACAAAAG.CATACGAGCTACTTGAAA 
TTAAGACAAAGGGGCTTTCTAAGCTTGATGCTCAGACAAAAGCATATGAGCTACTTGAAA 
TTAAGACAAAGGGGCTTTCTAAGCTTGATGCTCAGACA7VAAGCATATGAGCTACTTGAAA 
TTAAGACAAAGGGACTTTCTAAGCTTGATGCTCAGACAAAAGCATACGAGCTACTTGAAA 
TTAAGACAAAGGGACTTTCTAAGCTTGATGCTCAGACAAAAGCATACGAGCTACTTGAAA 

AAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAGCTAGCTTATCTGGAGGACAACAAC 
AAGTTGGACTCAAAGAGAAGGCTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAAC 
AAGTTGGACTCAAAGAGAAGGCTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAAC 
AAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAGCAAGCTTATCTGGAGGACAACAAC 
AAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAGCTAGCTTATCTGGAGGGCAACAAC 
AAGTTGGACTCAAAGAGAAGGCTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAAC 
AAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAGCTAGCTTATCTGGAGGACAACAAC 
AAGTTGGACTCAAAGAGAAGGCTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAAC 
AAGTTGGACTCAAAGAGAAGGCTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAAC 

AAGT TGGACTCAAAGAGAAGGCTAAT GCT T ATCCAGCAAGCTTATCTGG 

AAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAGCAAGCTTATCTGGTABCMARATVS 

ACGGATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAAC 
ACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTCATGTCCTTCTTTTTGATGAAC 
ACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAAC 
ACGGATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAAC 
ACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAAC 
ACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAAC 
ACGAATTGCTAT TGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTT TGATGAAC 
ACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAAC 
ACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTT TGATGAAC 



NCS RATNGTS AG 

TACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAG 
TACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAG 
TACTTCAGCTCT TGATCCTGAAATGGTAGGTGAAGTCTTGACTG T T ATGCAAGAT TTAG 
TACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAG 
TACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAG 
TACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAG 
TACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAG 
TACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAG 
TACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAG 



TAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAG 
TAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAG 
TAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAG 
TAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAG 
TAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAG 
TAAATCTGGTATGACG ATGGT TAT T GTCACTCATGAAATGGGTTT TGCACGTGAAGT AG 
TAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAG 
TAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAG 
TAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTT TGCACGTGAAGT AG 
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Table 15: Comparative Sequences relating t SAG0492 



SEQ1501 
SEQ1502 
SEQ1503 
SEQ1504 
SEQ1505 
SEQ1506 
SEQ1507 
SEQ1508 
SEQ1509 
SEQ1510 
SEQ1511 

SEQ1S01 
SEQ1502 
SEQ1503 
SEQ1504 
SEQ1505 
SEQ1506 
SEQ1507 
SEQ1508 
SEQ1509 
SEQ15X0 
SEQ1511 



GGATCGTGTCATTTTTATGGATGCAGGCATTATTGT-GAGCAAGGGACCCCTAAGGAAG 

GGATCGTGTCATTTTTATGGACGCAGAAATTAT 

GGATCGTGTCATTTTTATGGATGCAGGAATTATTGTTGAGCAAGGGGCCC 

GGATCGTGTCATTTTTATGGATGCAGGGATTATTGTTGAGCAAGGGACCCCTAAGAAAG 
GGATCGTGTCATTTTTATGGATGCAGGCATTATTGTTGASCAAGGGACCCCTAAGGAAG 
GGATCGTGTCATT TT TATGGATGC AGGAAT TATTGTGAGCAAGGGGCCCCTAAGGAAGT 

GGATCGTGTC-TTTTTATGGATGCGGGAATTATTGT-GAGCAAGGGACC 

GGATCGTGTCATT TTTATGGATGCASGAATTATTGTTGAGCAAGGGGCCCCTAAGGAAG 
GGATCGTGTCATTTTTATGGATGCAGGAATTATTGTTGAGCAAGGGGCCCCTAAGGAAG 



AT- 



AT 

A 

T T TG AGCAGAC AAAAGAAATCCGCACAAGAGAT T TCTT 

AT 

ATTTAGCAAAACAAAAGAAAT 



i 
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^pible 16: Comparative Sequences relating to SAG0767 (D-alanine - D-alanine ligase) 

SEQ ID NO. 1601: SAG0767 FROM THE M781 GBS TYPE III STRAIN 

TGGTCGCTCTGTCGGAACGTGAAGTATCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTT 
GTTAAAACTTATTTTATCACGCAAGTAGGTCAATTTATTAAAACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAA 
GTTAATGACAAACCAAACTGTTGATTTAGACAAAATGGTTCGTCCAAGTGATATCTATGATGATAATGCAATTGTTTTCC 
CCGTTTTACATGGACCAATGGGGGAAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACT 
AATATTCTATCTTCAAGCGTGGCTATGGATAAAATTACAACAAAACAAGTCCTTGCAACAGTAGGTGTACCTCAGGTTGC 
ATATCAAACTTATTTTGAGGGTGATGATTTGGAACATGCGATTAAACTCTCTTTAGAAACTTTAAGTTTCCCAATTTTTG 
TAAAACCGGCTAATATGGGGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTA 
GCTCTCAAGTATGATAGCCGTATT TTGATTGAACAAGGCGT GACAGCTCGTGAAATTGAAGTAGGTATTT T AGGCAATAA 
TGATGTTAAGACAACTTTTCCTGGCGAAGTTGTTAAAGACGTCGATTTCTATGACTATGACGCCAAATATATTGATAATA 
AAATTACTATGGATATTCCAGCTAAAGTTGATG7VAGCAACTATGGAAGCAATGCGTCAATATGCAAGTAAAGCTTTTAAA 
GC7yVTCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGATGGACAAATCTTCTTAAACGAACTGAATAC 
AATGCCCGGTTTTACTCAGTGGTCAATGTATCCTCTGCTTTGGGAAAATATGGGGCTAACTTATAGTGATTTGATTG 

SEQ ID NO. 1602: SAG0767 FROM THE 090 GBS TYPE la STRAIN 

AAACCGGGCAT TGTATTCAGTTCGTT TAAG AAGACTTGTCCATCTTTCGTCAAAAAGAAATCACAG CGTGAT AAACCAC A 
AGCCCCGATTGCTTTAAAAGCTTTACTTGCATATTGACGCATTGCTTCCATAGTTGCTTCATCAACTTTAGCTGGAATAT 
CCATAGTAATTTTATTATCAATATATTTGGCGTCATAGTCATAGAAATCGACGTCTTTAACGACTTCGCCAGGAAAAGTT 
GTCTTAACATCATTATTGCCTAAAATACCTACTTCAATTTCACGAGCTGTCACGCCTTGTTCAATCAAAATACGGCTATC 
AT ACTTGAGAGCTAAGTCAAT ks CAGAGCGAAGTGAGGATTCATCTGTCGCTT TTGAAATACCTACTGATGACCCCATAT 
TAGCCGGTTTTACAAAAATTGGGAAACTTAAAGTTTCTAAAGAGAGTTTAATCGCATGTTCCAAATCATCACCCTCAAAA 
TAAGTTTGATATGCAACCTGAGGTACACCTACTGTTGCAAGGACTTGTTTTGTTGTAATTTTATCCATAGCCACGCTTGA 
AGATAGAATATTAGTCCCAACATAAGGCATCCTTAAAACTTCTAAAAATCCTTGGATAGAACCATCTTCCCCCATTGGTC 
CATGTAAAACGGGGAAAACAATTGCATTATCAT CATAGATATCAC TTGG ACGAACCAT TTTGTCTAAATCAACAGTTT GG 
TTTGTCATTAACTTTTCATCTGAAGATGGCATTTCATCAAATTCTTGTGTTTTAATAAATTGACCTACTTGCGTG 

SEQ ID NO. 1603: SAG0767 FROM THE COH1 TYPE la STRAIN 

TCGCTCTGCGGAACGTGAAGTATCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTA 
AAACTTATTTTATCACGCAAGTAGGTCAATTTATTAAAACACAAGAATTTGATGA7\ATGCCATCTTCAGATGAAAAGTTA 
ATGACAAACCAAACTGTTGATTTAGACAAAATGGTTCGTCCAAGTGATATCTATGATGATAATGCAATTGTTTTCCCCGT 
TTTACATGGACCAATGGGGGAAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATA 
TTCTATCTTCAAGCGTGGCTAT 

SEQ ID NO. 1604: SAG0767 FROM THE CJB110 GBS NONTYPEABLE STRAIN REVERSE 
COMPLEMENT 

CGTCGATTTCTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGGATATTCCAGCTAAAGTTGATGAAGCAA 
CTATGGAAGCAATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTT 
TTGACGAAAGATGGACAAATCTTCTTAAACGAACTGAATACAATGCCC 

SEQ ID NO. 1605: SAG0767 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

AACGTGAAGTATCTGTACTGCTCTGCAGAAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTATVAACTTATT 
TTATCACGCAAGTAGGTCAATTTATTAAAACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAA 

SEQ ID NO. 1606: SAG0767 FROM THE 1169NT1 GBS TYPE V STRAIN REVERSE COMPLEMENT 

CTAATATGGGGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCAAG 

TATGATAGCCGTATTTTGATTGAACAAGGCGTGACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAA 

GACAACTTTTCCTGGCGAAGTCGTTAAAGACGTCGATTTCTATGACTATGACGCCAAATATATTGATAATAAAATTACTA 

TGGATATTCCAGCTAAAGTTGATGAAGCAACTATGGAAGCAATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGG 

GCTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGATGGACAAATCTTCTTAAACGAACTGAATACAATGCCCGG 

TTTTACTCAGTGGTCAATGTATCCTCTGCTTTGGGAAAAT 

SEQ ID NO. 1607: SAG0767 FROM THE 18RS21 GBS TYPE II STRAIN REVERSE COMPLEMENT 

TTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCGTGACAGCTCGTGAAATTGAAGTAGGTATTTTA 

GGC/U^TAATGATGTTAAGACAACTTTTCCTGGCGAAGTCGTTAAAGACGTCGATTTCTATGACTATGACGCCAAATATAT 

TGATAATAAAATTACTATGGATATTCCAGCTAAAGTTGATGAAGC^CTATGGAAGOUVTGCGTCAATATGCAAGTAAAG 

CTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGATGGACAAATCTTCTTAAACGAA 

CTGAATACAATGCCCGGTTTTACTCAGTGGTCAATGTATCCCCTGCTTTGGGAAAAGTATGGGGCTAACCTT 

SEQ ID NO. 1608: SAG0767 FROM THE 18RS21 GBS TYPE II STRAIN 

ATCTGTACTGTCTGCAGAAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCJ\A 
GTAGGTCAATTTATTAAAACACAAGAAT T TGATGAAATGCCATCTTCAGATGAAAAGTTAATG ACAAACCAAACT GTTGA 
TTTAGACAAAATGGTTCGTCCAAGTGATATCTATGATGATAATGCAATTGTTTTCCCCGTTTTACATGGACCAATGGGGG 
AAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCTATCTTCAA 
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^pible 16: Comparative Sequences relating to SAG0767 (D-alanine - D-alanine ligase) 

SEQ ID NO. 1609: SAG0767 FROM THE 2603V/R GBS TYPE V STRAIN REVERSE COMPLEMENT 

GGCTATGGATAAAATTAC/^ACAAAACAAGTCCTTGCAACAGTAGGTGTACCTCAGGTTGCATATCAAACTTATTTTGAGG 

GTGATGATTTGGAACATGCGATTAAACTCTCTTTAGAAACTTTAAGTTTCCCAATTTTTGTAAAACCGGCTAATATGGGG 

TCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCG 

TATTTTGATTGAACAAGGCGTGACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTC 

CTGGCGAAGTCGTTAAAGACGTCGATTTCTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGGATATTCCA 

GCTAAAGTTGATGAAGCAACTATGGAAGCAATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTT 

ATCACGCTGTGATTTCTTTTTGACGAAAGAATGGACAAATCTTCTTAAACGAACTGAAATAC 

SEQ ID NO. 1610: SAG0767 FROM THE 2603V/R GBS TYPE V STRAIN 

TCT GTACTGTCT GCAGAAAGCGTCATGCGTGCT ATTAAT TATGATAAATT TTTTGT TAAAACTTATTT TAT CACGCAAGT 
AGGTCAATTTATTAAAACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAAGTTAATGACAAACCAAACTGTTGATT 
TAGACAAAATGGTTCGTCCAAGTGATATCTATGATGATAAT 

SEQ ID NO. 1611: SAG0767 FROM THE H36b GBS TYPE lb STRAIN REVERSE COMPLEMENT 

AAAACCGGCTAATATGGGGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAG 

CTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCGTGACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAAT 

GATGTTAAGACAACTTTTCCTGGCGAAGTCGTTAAAGACGTCGATTtCTATGACTATGACGCCAAATATATTGATAATAA 

AATTACTATGGATATTCCAGCTAAAGTTGATGAAGCAACT ATGGAAGCAAT GCGTCAATATGCAAGTAAAGCTT T TAAAG 

CAATCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGATGGACAAATCTTCTTAAACGAACTGAATACA 

ATGCCCGGTTTTACTCAGTGGTCAATGTATCCCCTGCTTTGGGAAAATATGGGGCTAACTTATAG 

SEQ ID NO. 1612: SAG0767 FROM THE H36b TYPE lb STRAIN 

CGTGAAGTATCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTAT 
CACGCAAGTAGGTCAATTTAT TAAAACACAAGAATT TGATGAAATGCCATCTTCAGATGAAAAGTTAATGACAAACCAAA 
CTGTTGATTTAGACATVAATGGTTCGTCCAAGTGATATCTATGATGATAATGCAATTGTTTTCCCCGTTTTACATGGACCA 
ATGGGGGAAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCTATCTTCAAG 
CGTGGCTATGGATAAAATTACAACAAAACAAGTCCTTGCAACAGTAG 

SEQ ID NO. 1613: SAG0767 FROM THE M732 GBS TYPE III STRAIN REVERSE COMPLEMENT 

ATGCGATTAAACTCTCTTTAGAACCTTTAAGTTTCCCAATTTTTGTAAACCCGGCTAATATGGGGTCATCAGTAGGTATT 

TCAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACA 

AGGCGTGACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTCCTGGCGAAGTTGTTA 

AAGACGTCGATTTCTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGGATATTCCAGCTAAAGTTGATGAA 

GCAACTATGGAAGCAATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTT 

CTTTTTGACGAAAGATGGACAT^ATCTTCTTAAACGAACTGAATACAATGCCCGGTTTTACTCAGTGGTCAATGTATCCTC 

TGCTTTGGGAAAATATGGGGCTAACTT 

SEQ ID NO. 1614: SAG0767 FROM THE M732 GBS TYPE III STRAIN 

GTCATGCCGTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCAAGTAGGTCAATTTATTAAAACAC 
AAGAATTTGATGAAATGCCATCTTCAGATGAAAAGTTAATGACAAACCAAACTGTTGATTTAGACAAAATGGTTCGTCCA 
AGTGATATCTATGATGATAATGCAATTGTTTTCCCCGTTTTACATGGACCAATGGGGGAAGATGGTTCTATCCAAGGATT 
TTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCTATCTTCAAGCGTGGCTATGGATAAAATTACAACAAAAC 
AAGTCCTTGCAACAGTAGGTGTACCTCAGG 

SEQ ID NO. 1615: SAG0767 FROM THE A909 GBS TYPE la STRAIN REVERSE COMPLEMENT 

TTTTGAGGGTGATGATTTGGAACATGCGATTAAACTCTCTTTAGAAACTTTAAGTTTCCCAATTTTTGTAi\AACCGGCTA 

ATATGGGGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCAAGTAT 

GATAGCCGTATTTTGAT TGAACAAGGCGTGACAGCTCGTGAAATTGAAGT AGGTAT TTTAGGCAATAATGATGT TAAGAC 

AACTTTTCCTGGCGAAGTCGTTAAAGACGTCGATTTCTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGG 

ATATTCCAGCTAAAGTTGATGAAGCAACTATGGAAGCAATGCGTCAATATGCAAGT7\AAGCTTTTAAAGCAATCGGGGCT 

TGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGATGGACAAATCTTCTTAAACGAACTGT^ATACAATGCCCGGTTT 

TACTCAGTGGTCAATGTATCCCCTGCTTTGGGAAAATATGGGGCTAACTTATAGTGA 

SEQ ID NO. 1616: SAG0767 FROM THE A909 GBS TYPE la STRAIN 

TGGTCGCTCTGCGGAACGTGAAGTATCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTG 
TTAAAACTTATTTTATCACGCAAGTAGGTCAATTTATTAAAACACTVAGAATTTGATGAAATGCCATCTTCAGATGAAAAG 
TTAATGACAAACCAAACTGTTGATTTAGACAAAATGGTTCGTCCAAGTGATATCTATGATGATAATGCAATTGTTTTCCC 
CGTTTTACATGGACCAATGGGGGAAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTA 
ATATTCT ATCT TCAAGCGTGGCTATGGATAAAATTACAACAAAACAAGTCCT TGCAACAGT AGG 
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ble 16: Comparative Sequences relating to SAG0767 (D-alanine - D-alanine ligase) 



SEQ ID NO. 1617: SAG0767 FROM THE JM9130013 GBS TYPE VIII STRAIN REVERSE 
COMPLEMENT 

AAGCAGGGGATACATTGACCACTGAGTAT^AACCGGGCATTGTATTCAGTTCGTTTAAGAAGATCTGTCCATCTTTCGTCA 
AAAAGAAATCACAGCGTGATAAACCACAAGCCCCGATTGCTTTAAAAGCT TTACTTGCATATTGACGCATTGCT T CCATA 
GATGCTTCATCAACTT TAGCTGGAATATCCATAGCAATTTTATTAT CAATATAT TTGGCG 

SEQ1601 GGTCGCTCTGTCGGAACGTGAAGTATCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTA 

SEQ1602 

SEQ1603 

SEQ1604 

SEQ1605 

SEQ1606 

SEQ1607 

SEQ1608 

SEQ1609 \ 

SEQ1610 

SEQ1611 

SEQ1612 

SEQ1613 

SEQ1614 

SEQ1615 

SEQ1616 

SEQ1617 

SEQ1601 TAATTATGATAAATT TTT TGTTAAAACTTATTTTATCACGCAAGT AGGTCAAT T TATTA 

SEQ1602 

SEQ1603 

SEQ1604 

SEQ1605 

SEQ1606 ^ ; 

SEQ1607 

SEQ1608 

SEQ1609 

SEQ1610 

SEQ1611 

SEQ1612 

SEQ1613 

SEQ1614 

SEQ1615 

SEQ1616 

SEQ1617 

SEQ1 601 AACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAAGTTAATGACAAACCAAACTG 

SEQ1602 

SEQ1603 

SEQ1604 

SEQ1605 

SEQ1606 

SEQ1607 

SEQ1608 

SEQ1609 

SEQ1610 

SEQ1611 

SEQ1612 

SEQ1613 

SEQ1614 

SEQ1615 

SEQ1616 

SEQ1617 
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ble 16: Comparative Sequences relating to SAG0767 (D-alanine - D-alanine Iigase) 



SEQ1601 TGATTTAGACAAAATGGTTCGTCCAAGTGATATCTATGATGATAATGCAATTGTTTTCC 

SEQ1602 

SEQ1603 

SEQ1604 

SEQ1605 

SEQ1606 

SEQ1607 

SEQ1608 : 

SEQ1609 

SEQ1610 

SEQ1611 

SEQ1612 

SEQ1613 

SEQ1614 

SEQ1615 

SEQ1616 

SEQ1617 

SEQ1601 CGTTTTACATGGACCAATGGGGGAAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAA 

SEQ1602 

SEQ1603 

SEQ1604 

SEQ1605 

SEQ1606 

SEQ1607 

SEQ1608 

SEQ1609 

SEQ1610 

SEQ1611 

SEQ1612 ; . 

SEQ1613 

SEQ1614 

SEQ1615 

SEQ1616 

SEQ1617 

SEQ1601 GATGCCTTATGTTGGGACTAATATTCTATCTTCAAGCGTGGCTATGGATAAAATTACAA 

SEQ1602 

SEQ1603 

SEQ1604 

. SEQ1605 ' 

SEQ1606 

SEQ1607 

SEQ1608 

SEQ1609 GGCTATGGATAAAATTACAA 

SEQ16X0 

SEQ1611 

SEQ1612 

SEQ1613 

SEQ1614 

SEQ1615 

SEQ1616 

SEQ1617 
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^pible 16: 

SEQ1601 
SEQ1602 
SEQ1603 
SEQ1604 
SEQ1605 
SEQ1606 
SEQ1607 
SEQ1608 
SEQ1609 
SEQ1610 
SEQ1611 
SEQ1612 
SEQ1613 
SEQ1614 
SEQ1615 
SEQ1616 
SEQ1617 

SEQ1601 
SEQ1602 
SEQ1603 
SEQ1604 
SEQ1605 
SEQ1606 
SEQ1607 
SEQ1608 
SEQ1609 
SEQ1610 
SEQ1611 
SEQ1612 
SEQ1613 
SEQ1614 
SEQ1615 
SEQ1616 
SEQ1617 

SEQ1601 
SEQ1602 
SEQ1603 
SEQ1604 
SEQ1605 
SEQ1606 
SEQ1607 
SEQ1608 
SEQ1609 
SEQ1610 
SEQ1611 
SEQ1612 
SEQ1613 
SEQ1614 
SEQ1615 
SEQ1616 
SEQ1617 

SEQ1601 
SEQ1602 
SEQ1603 
SEQ1604 
SEQ1605 
SEQ1606 
SEQ1607 
SEQ1608 
SEQ1609 



Comparative Sequences relating to SAG0767 (D-alanine - D-alanine ligase) 

AAAACAAGTCCTTGCAACAGTAGGTGTACCTCAGGTTGCATATCAAACTTATTTTGAGG 



A7\AACAAGTCCTTGCAACAGTAGGTGTACCTCAGGTTGCATATCAAACTTATTTTGAGG 



•TTTTGAGG 



TGATGATTTGGAACATGCGATTAAACTCTCTTTAGTVAACTTTAAGTTTCCCAATTTTTG 



TGATGATTTGGAACATGCGATTAAACTCTCTTTAGAAACTTTAAGTTTCCCAATTTTTG 



ATGCGATTAAACTCTCTTTAGAACCTTTAAGTTTCCCAATTTTTG 

TGATGATTTGGAACATGCGATTAAACTCTCTTTAGAAACTTTAAGTTTCCCAATTTTTG 



AAAACCGGCTAATATGGGGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCAC 

AAACCGGGC 

TCGCTCTGCGGAACGTGAAGTATCTGTACTG~TCTGCAGAAA-GCGT 



AACGTGAAGTATCTGTACTGCTCTGCAGAAAAGCGT 

CTAATATGGGGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCAC 



ATCTGTACTG-TCTGCAGAAAAGCGT 

AAAACCGGCTAATATGGGGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCAC 

TCTGTACTG-TCTGCAGAAA-GCGT 

AAAACCGGCTAATATGGGGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCAC 

CGTGAAGTATCTGTACTG-TCTGCAGAAA-GCGT 

AAACCCGGCTAATATGGGGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCAC 

AAAACCGGCTAATATGGGGTCATCAGTAGGT AT TTCAAAAGCGACAGATGAATCCT CAC 

TGGTCGCTCTGCGGAACGTGAAGTATCTGTACTG-TCTGCAGAAA-GCGT 

AAGCAGGGGATACATTGACCACTGAGTAAAACCGGGC 

TCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCG 
TTGT-ATTCAGTTCGTTTAAGAAGACTTGTCCATCTTTCGTCA7VAAAGAAATCACAGCG 
ATGC-GTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCAAGTAG 



ATGC-GTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCAAGTAG 
TCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCG 

TTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCG 

ATGC-GTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCAAGTAG 
TCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCG 
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^pible 16: Comparative Sequences relating t SAG0767 (D-alanine - D-alanine ligase) 



SBQ1610 ATGC-GTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCAAGTAG 

SEQ1611 TCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCG 

SEQ1612 ATGC-GTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCAAGTAG 

SEQ1613 TCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCG 

SEQ1614 ATGCCGTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCAAGTAG 

SEQ1615 TCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCG 

SEQ1616 ATGC-GTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCAAGTAG 

SEQ1617 TTGT-ATTCAGTTCGTTTAAGAAGATCTGTCCATCTTTCGTCAAAAAGAAATCACAGCG 

SEQ1601 GACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTC 

SEQ1602 GATAAACCACAAGC CCCGATTGCTTTAAAAGCTTTACTTGCATATTGACGCATTG 

SEQ1 603 GTCAATTTATTAAA ACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAAGTTA 

SEQ1604 

SEQ1 605 GTCAATTTATTAAA ACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAA 

SEQ1606 ' GACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTC 

SEQ1607 GACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTC 

SEQ1 608 GTCAATTTATTAAA ACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAAGTTA 

SEQ1 60 9 GACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTC 

SEQ1 610 GTCAATTTATTAAA ACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAAGTTA 

SEQ1611 GACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTC 

SEQ1 612 GTCAATTTATTAAA ACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAAGTTA 

SEQ1613 GACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTC 

SEQ1 614 GTCAATTTATTAAA ACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAAGTTA 

SEQ1 615 GACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTC 

SEQ1 616 GTCAATTTATTAAA ACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAAGTTA 

SEQ1 617 GATAAACCACAAGC CCCGATTGCTTTAAAAGCTTTACTTGCATATTGACGCATTG 

SEQ1601 TGGCGAAGTTGTTAAAGACGTCGATTTCTATGA — CTATGACGCCAAAT-ATATTGATA 

SEQ1602 TTCCATAGTT GCTTCATCAACTTTAGCTGGAATATCCATAGTAATTTTATTATCA 

SEQ1 603 TGACAAACC AAACTGTTGATTTAGACAAAATGGTTCGTCCAAGTGATATCTATG 

SEQ1604 CGTCGAT T TCT ATGA — CTATGACGCCAAAT-ATATTGATA 

SEQ1605 

SEQ1606 TGGCGAAGTCGTTAAAGACGTCGATTTCTATGA — CTATGACGCCAAAT-ATATTGATA 

SEQ1607 TGGCGAAGT CGT TAAAGACGTCGATT TCTATGA — CTATGACGCCAAAT-ATATTGATA 

SEQ1 60 8 TGACAAACC AAACTGTTGATTTAGACAAAATGGTTCGTCCAAGTGATATCTATG 

SEQ1 609 TGGCGAAGTCGTTAAAGACGTCGATTTCTATGA — CTATGACGCCAAAT-ATATTGATA 

SEQ1 610 TGACAAACC AAACTGTTGATTTAGACAAAATGGTTCGTCCAAGTGATATCTATG 

SEQ1611 TGGCGAAGTCGTT7VAAG ACGTCGATTTCT ATGA — CTATGACGCCAAAT -AT AT TGATA 

SEQ1 612 TGACAAACC AAACTGTTGATTTAGACAAAATGGTTCGTCCAAGTGATATCTATG 

SEQ1613 TGGCGAAGTTGTTAAAGACGTCGATTTCTATGA — CTATGACGCCAAAT-ATATTGATA 

SEQ1 614 TGACAAACC AAACTGTTGATTTAGACAAAATGGTTCGTCCAAGTGATATCTATG 

SEQ1615 TGGCGAAGTCGTTAAAGACGTCGATTTCTATGA — CTATGACGCCAAAT-ATATTGATA 

SEQ1 61 6 TGACAAACC AAACTGTTGATTTAGACAAAATGGTTCGTCCAAGTGATATCTATG 

SEQ1617 TTCCATAGAT GCTTCATCAACTTTAGCTGGAATATCCATAGCAATTTTATTATCA 

SEQ1601 T7U\AAT TACTAT — GGATAT TCCAGCTAAAGT TGATGAAGCAACT ATGGAAGCAATGCG 

SEQ1602 TATATTTGGCGTCATAGTCATAGAAATCGACGTCTTTAACGACTTCGCCAGG — AAAAG 

SEQ1603 TGATAATGCAAT — TGTTTTCCCCGTTTTAC ATGGACCAATGGGGGAAG — ATGGT 

SEQ1604 TAAAATT ACTAT — GGAT ATTCCAGCTAAAGTTGAT GAAGCAACTAT GGAAGCAATGCG 

SEQ1605 

SEQ1 60 6 TAAAATTACTAT — GGATATTCCAGCTAAAGTTGATGAAGCAACTATGGAAGCAATGCG 

SEQ1 607 TAAAATTACTAT — GGATATTCCAGCTAAAGTTGATGAAGCAACTATGGAAGCAATGCG 

SEQ1608 TGATAATGCAAT — TGTTTTCCCCGTTTTAC ATGGACCAATGGGGGAAG— ATGGT 

SEQ1609 TAAAATTACTAT — GGAT ATTCCAGCTAAAGT TGATGAAGCAACTATGGAAGCAATGCG 

SEQ1610 TGATAAT 

SEQ1611 TAAAATTACTAT — GGATATTCCAGCTAAAGTTGATGAAGCAACTATGGAAGCAATGCG 

SEQ1612 TGATAATGCAAT — TGTTTTCCCCGTTTTAC ATGGACCAATGGGGGAAG — ATGGT 

SEQ1 613 TAAAATTACTAT — GGATATTCCAGCTAAAGTTGATGAAGCAACTATGGAAGCAATGCG 

SEQ1614 TGATAATGCAAT — TGTTTTCCCCGTTTTAC ATGGACCAATGGGGGAAG — ATGGT 

SEQ1615 TAAAATTACTAT — GGATATTCCAGCTAAAGTTGATGAAGCAACTATGGAAGCAATGCG 

SEQ1616 TGATAATGCAAT — TGTTTTCCCCGTTTTAC ATGGACCAATGGGGGAAG — ATGGT 

SEQ1617 TATATTTQGCGT ABLECMPARATIVESEQENCESRELA-TINGTSAGD — ALANI 
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able 16: C mparadve Sequences relating t SAG0767 (D-alanine - D-alanine Iigase) 



SEQ1601 
SEQ1602 
SEQ1603 
SEQ1604 
SEQ1605 
SEQ1606 
SEQ1607 
SEQ1608 
SEQ1609 
SEQ1610 
SEQ1611 
SEQ1612 
SEQ1613 
SEQ1614 
SEQ1615 
SEQ1616 
SEQ1617 

SEQ1601 
SEQ1602 
SEQ1603 
SEQ1604 
SEQ1605 
SEQ1606 
SEQ1607 
SEQ1608 
SEQ1609 
SEQ1610 
SEQ1611 
SEQ1612 
SEQ1613 
SEQ1614 
SEQ1615 
SEQ1616 
SEQ1617 

SEQ1601 
SEQ1602 
SEQ1603 
SEQ1604 
SEQ1605 
SEQ1606 
SEQ1607 
SEQ1608 
SEQ1609 
SEQ1610 
SEQ1611 
SEQ1612 
SEQ1613 
SEQ1614 
SEQ1615 
SEQ1616 
SEQ1617 



CAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTT 
T — GTCTTAACATCATTATTGCCTAAAATACCXACTTCAATTTCACGAGCTGTCACGCC 
C — TATCCAAG — GATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCT 
CAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTT 

CAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTT 
CAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTT 
C — TATCCAAG — GATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCT 
CAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTT 

CAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTT 
C — TATCCAAG — GATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCT 
CAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTT 
C — TATCCAAG — GATTT TTAGAAGTTTTAAGGATGCCT T ATGT TGGGACTAAT ATTCT 
CAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTT 
C — TATCCAAG — GATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCT 
E — DALANINELIGASE 

T TTT TGACGAAAGA — TGGACAAATCTTCTTAAACGAACTGAA-TACAATGCCCGGTTT 
TGTTCAATCAAAATACGGCTATCATACTTGAGAGCT7\AGTCAATKSCAGAGCGAAGTGA 

TCTTCAAGCGTGGCTAT 

T TTT TGACGAAAGA — TGGACAAATCTTCTTAAACGAACTGAA-TACAATGCCC 

TTTTTGACGAAAGA — TGGACAAATCTTCTTAAACGAACTGAA-TACAATGCCCGGTTT 
TTTTTGACGAAAGA — TGGACAAATCTTCTTAAACGAACTGAA-TACAATGCCCGGTTT 

TCTTCAA 

T TT T TGACGAAAGAA- TGGACAAATCTTCTTAAACGAACTGAAATAC 

TTTTTGACGAAAGA — TGGACAAATCTTCTTAAACGAACTGAA-TACAATGCCCGGTTT 

TCTTCAAGCGTGGCTATGGATAAAATTACAACAAAACAAGTCCTTGCAACAGTAG- 

TTTTTGACGAAAGA — TGGACAAATCTTCTTAAACGAACTGAA-TACAATGCCCGGTTT 
T CT TCAAGCGTGGCT ATGGATAAAATT ACAACAAAACAAGTCCTTGCAACAGT AGGTGT 
TTTTTGACGAAAGA — TGGACAAATCTTCTTAAACGAACTGAA-TACAATGCCCGGTTT 
TCTTCAAGCGTGGCTATGGATAAAATTACAACAAAACAAGTCCTTGCAACAGTAGG 

ACTCAGTGGTCAATGTATCCTCTGCTTTGGGAAAA-TATGGGGCTAACTTATAGTGATT 
GATTCATCTGTCGCTTTTGAAATACCTACTGATGACCCCATATTAGCCGGTTTTACAAA 

ACTCAGTGGTCAATGTATCCTCTGCTTTGGGAAAA-T 

ACTCAGTGGTCAATGTATCCCCTGCTTTGGGAAAAGTATGGGGCTAACCTT 

ACTCAGTGGTCAATGTATCCCCTGCTTTGGGAAAA-TATGGGGCTAACTTATAG 

ACTCAGTGGTCAATGTATCCTCTGCTTTGGGAAAA-TATGGGGCTAACTT 

CCTCAGG 

ACTCAGTGGTCAATGTATCCCCTGCTTTGGGAAAA-TATGGGGCTAACTTATAGTGA — 
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'able 16: Comparative Sequences relating to SAG0767 (D-alanine - D-alanine ligase) 



SEQ1601 GATTG 

SEQ1602 ATTGGGAAACTTAAAGTTTCTAAAGAGAGTTTAATCGCATGTTCCAAATCATCACCCTC 

SEQ1603 

SEQ1604 

SEQ1605 

SEQ1606 ! 

SEQ1607 ™ 

SEQ1608 

SEQ1609 i ZZZ 

SEQ1610 ZZZZZZZZ 

SEQ1611 ~ 

SEQ1612 

SEQ1613 Z' 

SEQ1614 ; '„ 

SEQ1615 : . 

SEQ1616 

SEQ1617 ZZZ 

SEQ1601 

SEQ1602 AAATAAGTTTGATATGCAACCTGAGGTACACCTACTGTTGCAAGGACTTGTTTTGTTGT 

SEQ1603 

SEQ1604 ZZ Z—ZZZ 

SEQ1605 Z 

SEQ1606 Z_ZZZ 

SEQ1607 ZZZZ 

SEQ1608 ZZZZZ 

SEQ1609 

SEQ1610 Z 

SEQ1611 ZZZZZ! 

SEQ1612 Z 

SPQ1613 ZZZZ-Z 

SEQ1614 ~ 

SEQ1615 

SEQ1616 _"* 

SEQ1617 ^ Z 

SEQ160X 

SEQ1602 ATTTTATCCATAGCCACGCTTGAAGATAGAATATTAGTCCCAACATAAGGCATCCTTAA 

SEQ1603 

SEQ1604 ZZZZ* 

SEQ1605 ZZ " 

SEQ1606 ZZZIZZ 

SEQ1607 ZZZZ-Z 

SEQ1608 Z ZZZ 

SEQ1609 ZJZZ ZZZ 

SEQ1610 ZZ-ZZ" 

SEQ1611 ZZZZ ZZZ 

SEQ1612 ZZZZZ ZZ 

SEQ1613 Z ZZZZZZ 

SEQ1614 ZZZZ~ 

SEQ1615 Z 

SEQ1616 ZZZZ~ 

SEQ1617 ZZZZZZ- 
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ble 16: Comparative Sequences relating to SAG0767 (D-alanine — D-alanine ligase) 

SEQ1601 ■ 

SEQ1 602 ACTTCTAAAAATCCTTGGATAGAACCATCTTCCCCCATTGGTCCATGT7U\AACGGGGAA 

SEQ1603 

SEQ1604 

SEQ1605 

SEQ1606 

SEQ1607 

SEQ1608 

SEQ1609 

SEQ1610 

SEQ1611 

SEQ1612 

SEQ1613 

SEQ1614 '■ " 

SEQ1615 : 

SEQ1616 

SEQ1617 

SEQ1601 

SEQ1602 ACAATTGCATTATCATCATAGATATCACTTGGACGAACCATTTTGTCTAAATCAACAGT 

SEQ1603 

SEQ1604 

SEQ1605 

SEQ1606 

SEQ1607 

SEQ1608 

SEQ1609 

SEQ1610 

SEQ1611 

SEQ1612 — 

SEQ1613 

SEQ1614 

SEQ1615 

SEQ1616 ; 

SEQ1617 

SEQ1601 

SEQ1602 TGGTTTGTCATTAACTTTTCATCTGAAGATGGCATTTCATCAAATTCTTGTGTTTTAAT 

SEQ1603 

SEQ1604 

SEQ1605 

SEQ1606 

SEQ1607 

SEQ1608 

SEQ1609 

SEQ1610 

SEQ1611 

SEQ1612 

SEQ1613 

SEQ1614 

SEQ1615 

SEQ1616 

SEQ1617 
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ble 16: Comparative Sequences relating to SAG0767 (D-alanine - D-alanine ligase) 



SEQ1601 

SEQ1 602 AATTGACCTACT TGCGTG 

SEQ1603 

SEQ1604 

SEQ1605 

SEQ1606 

SEQ1607 

SEQ160B 

SEQ1609 

SEQ1610 

SEQ1611 

SEJQ1612 

SEQ1613 

SEQ1614 

SEQ1615 

SEQ1616 

SEQ1617 
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17: C mparative Sequences relating to SAG1086 (xanthine ph phorib syltransferase) 



SEQ ID NO. 1701: SAG1086 FROM THE1169NT1 GBS NONTYPEABLE STRAIN 

TTTAAAGGTTGATTCCTTTTTGACTC^TCAGGTAGATTTTGAGTTAATGCAGGAAAT^ 

CCGGCATTACGAAGGTTGTTACGATTGAAGCATCTGGAATTGCGCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTO 
GCTAAAAAGGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGWTACGAGTCAAGTTTC 
TATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTO 

AAATTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTGTTA 
ACAGGTGTTCCAGT 

SEQ ID NO. 1702: SAG0767 FROM THE 18RS21 GBS TYPE II STRAIN 

TTTAGGTGAGAACATTTTAAAGGTTGATTCTTTTTTGACTCATCAGGTAGAT^ 

ATA7VATATAAAGAAGCCGGCATTACGAAGGTTGTTACGATTGAAGCATCTGGAATTGCACCAGCAGTGTACGCAGCTCAAGC^ 
GkACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCT^ 

TACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGC^ 

CTAAAGGATTACTTGAAATTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGT 

GATTTGTTAGAAAAAACA 

SEQ ID NO. 1703: SAG0767 FROM THE H36bl GBS TYPE Xk> STRAIN 

AAGAACGTATTCTTAAAGATGGTGATGTTTTAGGTGAGAACATTTTAAAAGTTGATTCTTTTTTGACTCATCAGGTAGATTTTGAGTTA 
ATGCAGGAAATAGGTAAAGTTTTTGCTGATAAATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCC 
AGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTG 
CTGAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTO 

GATGACiyTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATTATTGGTCAAGCTGGAGCT 
TATTGAAAAATCTTTCCAAGATGGGCGTGATT 



SEQ ID NO. 1704: SAG0767 FROM THE M732 GBS TYPE III STRAIN 

ATTCTTTTTTGACTATCAGGTAAATTTTGAGTTAATGCAGGAAATAGGTAAAGTTTTTGCTGATAAATATAAAGAAGCCGGCATTACGA 

AGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCT 

AAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTAC^^ 

CTTTTTATCTAACGATGATACTGTACT^^ 

AAGCTGAAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTGTTAGAAAAAACAGGTGTTC 
GTTACTTCTCTTGCTCGT 

SEQ ID NO. 1705: SAG0767 FROM THE M781 GBS TYPE III STRAIN REVERSE COMPLEMENT 

GAACGTATTCTTAAAGATGGTGATGTTTTAGGTGAGAACATTTTAAAA 
GCAGGAAATAGGTAAAGTTTTTGCTGATAAATATAAAGAAGCCGGCATTACG^^ 

CAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATRTTAACTGCT 
GAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATGATTGA 
TGACTTTTTAACAAACGGTCAAGC 



SEQ ID NO. 1706: SAG0767 FROM THE 090 GBS TYPE la STRAIN REVERSE COMPLEMENT 

ACATTTTAAAGGTTGATTCTTTTTTGACTCATCAGGTAGATTTTGAGTTAATGCAGGAAATAGGTAAAGTTTTTGCTGATAAATATAAA 

GAAGCCGGCATTACGAAGGTTGTTACGATTGAAGCATCTGGAATTGCACC^^ 

ATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAG 
TTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCAAACMGTCYAGCGGCTAAAGGATTA 
CTTGAAATTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTGTTAGA 
AAA 



SEQ ID NO. 1707: SAGO 7 67 FROM THE A909 GBS TYPE la STRAIN REVERSE COMPLEMENT 

ACGTATTCTTAAAGATGGTGATGTTTTAGGTGAGAACATTTTAAAAGTTGATTCTTTTTTGACTCATCAGGTAGATTTTGAGTTAATGC 

AGGAAATAGGTAAAGTTTTTGCTGATAAATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAGCA 

GTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGA 

AGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATG 

ACTTTTTAGCAAACGGKCAAGCGGSTAAAGGATTACTTGAAATTATTGGTCAAGCTGGAGCTA 

SEQ ID NO. 1708: SAG0767 FROM THE COHl GBS TYPE la STRAIN 

TTTAAAAGTTGATTCTTTTTTGACTCATCAGGTAAATTTTGACT 

CCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAGCAGTGTACGCAGCTCAAGC^TTGGGCGT 
GCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCT 

TATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTG 

AAATTATTGGTCAAGCTGAAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTGTTAGAA 

ACAGGTGTTCCGGTTAC 

SEQ ID NO. 1709: SAG0767 FROM THE CJBllO GBS NONTYPEAHLE STRAIN REVERSE 
COMPLEMENT 

GCTGATAAATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAA 

GGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAA 
AAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCAAACGGTCAA 
GCGGCTAAAGGATTACTTGAAATTTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGG 
GGCGTGATTTGTTAGAAAAAACAGGTGTTCCAGT 
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. juences relating to SAG1086 (xanthine phophoribosyltransferase) 

SEQ ID NO. 1710: SAG0767 FROM THE 2603 V/R GBS TOPE V STRAIN 

AACGTATTCTTAAAGATGGTGATGTTTTAGGTGAGAACATTTTAAAAGTTGATTCTTT^ 
CAGGAAATAGGTAAAGTTTTTGCTGATAAATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCT 

AGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTG 
AAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGAT 
GACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATTATTGGTCAAGCTGGAGCTAAGGTTGCT 
TGAAAAATCTTTCCAAGATGGGCGTGATTTGTTAGAAAAAACAGGTGTTCCAG 

SEQ ID NO. 1711: SAG0767 FROM THE JM9130013 GBS TYPE VIII STRAIN REVERSE 
COMPLEMENT 

ACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAA 
AGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGA^ 

GTCGCTTTTTATCTAACGATGATACTGTACTC^TCATTGATGACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACT 
GGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGA 



SEQ1701 TTTAAAGGTTGATTCCT 

SEQ1702 TTTAGGT GAGAACATTTTAAAGGTTG AT TCTT 

SEQ1703 AGAACGTATTCTTAAAGATGGTGATGTTTTAGGTGAGAACATTTTAAAAGTTGATTCTT 

SEQ1704 ATTCT 

SEQ1705 -GAACGTATTCTTAAAGATGGTGATGTTTTAGGTGAGAACATTTTAAAAGTTGATTCTT 

SEQ170 6 ACAT TTTAAAGGTTGAT TCTT 

SEQ1707 ACGTATTCTTAAAGATGGTGATGTTTTAGGTGAGAACATTTTAAAAGTTGATTCTT 

SEQ1708 TTTAAAAGTTGATTCTT 

SEQ1709 

SEQ1710 — AACGTATTCTTAAAGATGGTGATGTTTTAGGTGAGAACATTTTAAAAGTTGATTCTT 

SEQ1711 

SEQ1701 TTTGACTCATCAGGTAGATTTTGAGTTAATGCAGGAAATAGGTAAAGTTTTTGCTGATA 

SEQ1702 TTTGACTCATCAGGTAGATTTTGAGTTAATGCAGGAAATAGGTAAAGTTTTTGCTGATA 

SEQ1703 TTTGACTCATCAGGTAGATTTTGAGTTAATGCAGGAAATAGGTAAAGTTTTTGCTGATA 

SEQ1704 TTTTGACTATCAGGTAAATTTTGAGTTAATGCAGGAAATAGGTAAAGTTTTTGCTGATA 

SEQ1705 TTTGACTCATCAGGTAAATTTTGAGTTAATGCAGGAAATAGGTAAAGTTTTTGCTGATA 

SEQ1706 T TTGACT CATCAGGTAGATTTTGAGTTAATGCAGGAAATAGG TAAAGTTT T TGCTGAT A 

SEQ1707 TTTGACTCATCAGGTAGATTTTGAGTTAATGCAGGAAATAGGTAAAGTTTTTGCTGATA 

SEQ1708 TTTGACTCATCAGGTAAATTTTGAGTTAATGCAGGAAATAGGTAAAGTTTTTGCTGATA 

SEQ1709 GCTGATA 

SEQ1710 TTTGACTCATCAGGTAGATTTTGAGTTAATGCAGGAAATAGGTAAAGTTTTTGCTGATA 

SEQ1711 

SEQ1701 ATATAAAGAAGCCGGCATTACGAAGGTTGTTACGATTGAAGCATCTGGAATTGCGCCAG 

SEQ1702 ATATAAAGAAGCCGG CATT ACGAAGGT TGT TACGAT TGAAGC ATCTGGAATTGCACCAG 

SEQ1703 ATATAAAGAAGCCGGCAT TACGAAGGT TGT TACAATTGAAGCATCTGGAATTGCGCCAG 

SEQ1704 ATATAAAGAAGCCGGCATTACGAAGGTT GTT ACAATTGAAGCATC TGGAATTGCGCCAG 

SEQ1705 ATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAG 

SEQ1706 ATATAAAGAAGCCGGCATTACGAAGGTTGTTACGATTGAAGCATCTGGAATTGCACCAG 

SEQ1707 ATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAG 

SEQ1708 AT AT AAAGAAGCCGGCATTACGAAGGT TGT TACAAT TGAAGC ATCTGGAATTGCGCCAG 

SEQ1709 ATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAG 

SEQ1710 ATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAG 

SEQ1711 ACGAAGGTTGT TACAAT TGAAGC ATCTGGAATTGCGCCAG 

SEQ1701 CAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAGGCTAAGAACA 

SEQ1702 CAGTGTACGCAGCTC AAGCATTGGGCG KACCAATGATATT TGCTAAAAAAGCTAAGAACA 

SEQ1703 CAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACA 

SEQ1704 CAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACA 

SEQ1705 CAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACA 

SEQ1706 CAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACA 

SEQ1707 CAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAT^AAGCTAAGAACA 

SEQ1708 CAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACA 

SEQ1709 CAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACA 

SEQ1710 CAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACA 

SEQ1711 CAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAG7VACA 
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SEQ1701 
SEQ1702 
SEQ1703 
SEQ1704 
SEQ1705 
SEQ1706 
SEQ1707 
SEQ1708 
SEQ1709 
SEQ1710 
SEQ1711 



SEQ1701 
SEQ1702 
SEQ1703 
SEQ1704 
SEQ1705 
SEQ1706 
SEQ1707 
SEQ1708 
SEQ1709 
SEQ1710 
SEQ1711 

SEQ1701 
SEQ1702 
SEQ1703 
SEQ1704 
SEQ1705 
SEQ1706 
SEQ1707 
SEQ1708 
SEQX709 
SEQ1710 
SEQ1711 

SEQ1701 
SEQ1702 
SEQ1703 
SEQ1704 
SEQ1705 
SEQ1706 
SEQ1707 
SEQ1708 
SEQ1709 
SEQ1710 
SEQ1711 

SEQ1701 
SEQ1702 
SEQ1703 
SEQ1704 
SEQ1705 
SEQ1706 
SEQ1707 
SEQ1708 
SEQ1709 
SEQ1710 
SEQ1711 



7 „OSE&C2 

Comparative Sequences relating t SAG1086 (xanthine phophoribosyltransferase) 

TTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGWTACGA 
TTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGA 
TTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGA 
TTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGA 
TTACTATGACTGAAGGTATRTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGA 
T TACTATGACTG AAGGT ATCTTAACTGCTGAAGTGTATTCTT TTACAAAGCAAGT TACGA 
TT ACTATGACT GAAGGTATCTTAACTGCTGAAGTGT AT TCTTTT ACAAAGCAAGT TACGA 
TTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGA 
TTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTT TTACAAAGCAAGT TACGA 
TTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGA 
TTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGA 

GTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATG 
GTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATG 
GTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATG 
GTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATG 
GTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATG 
GTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATG 
GTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATG 
GTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATG 
GTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATG 
GTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATG 
GTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATG 

ACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATT-ATTGGTCAAGCTGGA 
ACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATT-ATTGGTCAAGCTGGA 
ACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATT-ATTGGTCAAGCTGGA 
ACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATT-ATTGGTCAAGCTGAA 

ACTTTTTAACAAACGGTCAAGC 

ACTTTTTAGCAAACMGTCYAGCGGCTAAAGGATTACTTGAAATT-ATTGGTCAAGCTGGA 
ACTTT T TAGCAAACGGKCAAGCGGS TAAAGGATT ACT TGAAAT T -ATTGGTCAAGCTGGA 
ACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATT-ATTGGTCAAGCTGAA 
ACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATTTATTGGTCAAGCTGGA 
ACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATT-Af TGGTCAAGCTGGA 
ACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATT-ATTGGTCAAGCTGGA 

CTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTG 
CTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTG 
CTAAGGTTGCTGGTATCGGAATCYTTATTGAAAAATCTTTCCAAGATGGGCGTGATT — 
CTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTG 

CTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTG 

CTA 

CTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTG 
CTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTG 
CTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTG 
CTAAGGTTGCTGGTATCGGA TABCMARATVSTNCSR — ATNGTS AGXANT HN 

TAGAAAAAACAGGTGTTCCAGT 

TAGAAAAAACA 

TAGAAAAAACAGGTGTTCCGGTTACTTCTCTTGCTCGT 
TAGAAAA 

TAGAAAAAACAGGTGTTCCGGTTAC 

TAGAAAAAACAGGTGTTCCAGT 

TAGAAAAAACAGGTGTTCCAG 

HRBSYTRANSRAS 
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Table 18: Comparativ Sequences relating to SAG1600 (glutamate racemase) 



SEQ XD NO. 1801: SAG1600 FROM THE H36b GBS TYPE lb STRAIN REVERSE COMPLEMENT 

AATCTTCATTGGAGATCAGGCTAGAGCTCCGTATGGTCCTA 

TATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCTGTTGCCTGGCAAGAAATTAAAGAAAAACTAGACGTG 

CCTGTTTTAGGCGTTATT/TTACCAGGAGCTAGCGCAGCTATCAAATCAACTAATTCAGGGAAAGTTGGTATTATAGGTACT 

TGTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCAAA 

TTGTGGAATCAAATCAGATGTCTTCTAGTTTAGCCAAAAAGGTGGTTTATGAAACGTTGTCCCCATTAGTTGGTAAATTAGATACTTTA 
ATTTTAGGTTGCACGCATTATCCCTTATTACGTCCCATCATTCAAAATGTTATGGGGGCTGAGGTTAAATTAATTGATAGTGGCGCAGA 
AACCGTTCGTGATATTTCTGTTTTATTGAACTATTTTGAGATAAACCATAATTGGCAAAATAAACACGGTGGTCATCACTTTTACACAA 
CCGCCAGCCCAA 

SEQ ID NO. 1802: SAG1600 FROM THE M732 GBS TYPE XII STRAIN REVERSE COMPLEMENT 

AAATGTTCCGTCAACTTCCAGAAGAGGAAGTAATCTTC^TTGGAGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAGATT 
AGAGAGTTTACCTGGCAGATGGTTAACTTCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTTGC 
CTGGCAAGAAATTAAAGAAAAACTAGACATCCCTGTTTTAGGCGTTATTTTACCAGGAGCTAGCG 

GGAAAGTTGGTATTATAGGTACTCCCATGACTGTTAAATCAGATGCTTATCGTCT^AAAAATTCAAGCTTTGTCTCCAAATACTGCTGTG 
GTATCCCn^GCTTGTCCGAAATTTGTTCCJ^TTGTGGAATCAAATC^ 

GTCCCCATTAGTTGGTAAATTAGATACTTTAATTTTAGGTTGCACGCATTATCCCCTATTACGTCCCATCATTCAAAATGTTATGGGGG 
CTGAGGTTAAATTAATTGATAGTGGCGCAGAAACCGTTCGTGATATTTCTGTTTTATTGAACTATTTTGAGATAAACCATAATTGGCAA 
AATAAACACGGTGGTCATCACTTTTACACAACCGCCAGCCCAAAAGGTTTTAAAGAAA 

SEQ ID NO. 1803: SAG1600 FROM THE 090 GBS TYPE la STRAIN 

AATCTTCATTGGAGACCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAGATTAGAGAGTtACCTGGCAGATGGTTAATTTCTT 
ATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGC^^ 

CTGTTTTAGGCGTTAT^TTACCAGGAGCTAGCGCAGCTATCAAATCAACTAATTCAGGGAAAGTTGGTATTATAGGTACTCCCATGACT 
GTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCAAATAC^ 

TGTGGAATCAAATCAGATGTCTTCTAGTTTAGCCAAAAAGGTGGTTTATGAAACGCTGTCCCCATTAGTTGGTAA 

TTTTAGGTTGCACGCATTATCCCTTATTACGTCCCATCATTCAAAATGTTATGGGGGCTGAGGTTAAATTAATTGATAGTGGCGCAGAA 

ACCGTTCGTGATATTTCTGTTTTATTGAACTATTTTGAGATaAmCC&TaATTGGsmAAATAAA 

CGsCAGCCCAAAAGGTTTTTAAGGAAATTGCAGAACAATGGCTTAATCAAGAAATAAAT 



SEQ ID NO. 1804: SAG1600 FROM THE A909 GBS TYPE la STRAIN 

GCGGTTGTGTAAAAGTGATGACCACCGTGTTTATTTTGCCAATTATGGTTTATCTCAAAATAGTTCAATAAAACAGAAATATCACGAAC 
GGTTTCTGCGCCACTATCAATTAATTTAACCTCAGCCCCCATAACATTTTGAATGATGGGACGTAATAGGGGATAATGCGTGCAACCTA 
AAATTAAAGTATCTAATTTACCAACTAATGGGGAC^CGTTTCATAAACCACCTTTTTGGCTAAACTAGAAGACATCTGATTTG 
ACAATTGGAACAAATTTCGGACAAGCAAGGGATACCACAGCAGTATTTGGAGACAAAGCTTGAATTTTTTGACGATAAGCATCTGATTT 
AACAGTCATGGGAGTACCTATAATACCAACTTTCCCTAAATTAGTTGATTTGATAGCTGCGCTAGCTCCTGGTAAAATAACGCCT 
•CAGGGATGTCTAGTTTTTCTTTAATTTCTTGCCAGGCAACTGCAGTTGCTGTATTACAAGCTATAACAATCATCTTAAC 
AATAAGAAGTTAACCATCTGCCAGGTAAACTCTCTAATCTGTTGAGCAGGTCTAGGACCATACGGAGCTCTAGCCTGATCTCCAATGAA 
GATTACTTCCTCTTCTGGAAGTTGACGGAACATTTCCTTAACAACCGTTAAACCACCT 

SEQ ID NO. 1805: SAG1600 FROM THE COH1 GBS TYPE la STRAIN 

TTCC6T<^CTTCCAAAATATGAAGTAATCTTCATTGGAGATCAGGCTAGAGCTC 

GTTTACCTGGCAGATGGTTAACTTCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTTGCCTGGC 

AAGAAATTAAAGAAAAACTAGACATCCCTGTTTTAGGCGTTATTTTACCAGGAGCTAGCGCAGCTATCAAATC^CTAATTTAGGGAAA 

GTTGGTATTATAGGTACTCCCATGACTGTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCT 

CCTTGCTTGTCCGAAAT 

SEQ ID NO. 1806: SAG1600 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

GTAATCTTCATTGGAGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAGATTAGAGAGTTTACCTGGCAGATGGTTAATTT 

CTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCT^CTGCAGTTGCCTGGCAAGAAATTA 

TAC 

SEQ ID NO. 1807: SA61600 FROM THE 1169NT1 GBS TYPE V STRAIN 

CTTTTGGGCTGGCGGTTGTGTAAAATTGATGACCACCGTGTTTATTTO 

ATATCACGAACGGTTTCTGCGCCACTATCAATTAATTTAACCTCAGCCCCCATAACATTTTGAATAATGGGACGTAATAGGGGATAATG 
CGTGCAACCTAAAATTAAAGTATCTAATTTACCAACTAATGGGGACAATGTTTCM 

GATTTGATTCC^C^TTGGAACAAATTTCGGACAAGCAAGGGATACCACAGCAGTATTTGGAGACAAAGCTTGTUV 
GCATCTGATTTAACAGTCATGGGAGTACCTATAA 

SEQ ID NO. 1808: SAG1600 FROM THE 1169NT1 GBS TYPE V STRAIN 

GTAATCTTCATTGGGGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAGATTAGAGAGTTTACCTGGCAGATGGTTAATTT 
CTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTT 

SEQ ID NO. 1809: SAG1600 FROM THE 18RS21 GBS TYPE II STRAIN 

GAAATGTTCCGTCAACTTCCAGAAGAGGAAGTAATCTTCATTGGAGATCAGGCTAGAGCTCCCT 

TAGAGAGTTTACCTGGCAGATGGTTAACTTCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTO 

CCTGGCAAGAAATTAAAGAAAAACTAGACATCCCTGTTTTAGGCGTTATTTTACCAGGAGCTAGCGCAGCTATCAAATCAACT 
GGGAAAGTTGGTATTATAGGTACTCCCATGACTGTTAAATCAGATGCTTATCGTCAAAAAATTCAAGC 
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Table 18: Comparative Sequences relating to SAG1600 (glucamate racemase) 



SEQ ID NO. 18X0: SA61600 FROM THE 18RS21 TYPE ZI STRAIN 
ATTTCTTTAAAACCTTTTGGGCTGGCGGTTGTGTAATATTGATGACCACCGT^^ 

caataaaacagaaatatcacgaacggtttctgcgccactatcaattaatttaacctcagcccccataacattttgaatgatgggacgta 
atatgggataatgcgtgcaacctaaaattaaagta 

seq id no. 1811: sag1600 from the 2603 v/r gbs type v strain 
a atttctttaaaaccttttgggctggcggttgtgtaataagtgatgaccaccgtgtttattttgccaattatggtttatctcaaaatag^ 
tcaataaaacagaaatatcacgaacggtttctgcgccactatcaattaatttaacctc^gcccccataacattttgaatgattc 
aataggggataatgcgtgcaacctaaaattaaagtatctaatttaccaactaatggggacm 
actagaagacatctgatttgattccacaattggaacaa 

seq id no. 1812: sag1600 from the m781 gbs type iii strain 

ggcggttgtgtaaaagtgatgaccaccgtgtttattttgccaattatggtttatctc^ 

cggtttctgcgccactatcaattaatttaacctcagcccccataacattttgaatgatgggacgtaataggggataatgcgtgcaacct 
aaaattaaagtatctaatttaccaactaatggggacaacgtttcataaaccacctttttggctaaactagaaga 

seq id no. 1813: sag1600 from the m 781 gbs type iii strain 

aatcttcattggagatcaggctagagctccgtatggtcctagacctgctcaacagattagagagtttacctggcagatggttaacttct 
tattgactaaaaatgttaagatgattgttatagcttgtaatacagcaactgc 

seq id no. 1814: sag1600 from the jm9130013 gs type viii strain 

tgggctggcggttgtgtaaaagtgatgaccaccgtgtttattttgccaattatggtttatctcaaaatagttcaataaaacagaaatat 

c^cgaacggtttctgcgccactatcaattaatttaacctcagcccccataacattttgaatgatgggacgtaataagggataatgcgtg 

caacctaaaattaaagtatctaatttaccaactaatggggacaacgtttcataaaccacctttttggctaaactagaagacatctgatt 

tgattcc^caattggaact^tttcggacaagcaagggataccacagcagtatttggagacaaagcttgaattttttgacgat;^ 

ctgatttaacagtcatgggagtacctataataccaactttccctgaa 



SEQ1 801 AATCTTCATTGGAGATCAGGCTAGAGCT 

SEQ1 802 AAATGTTCCGTCAACTTCCAGAAGAGGAAGTAATCTTCATTGGAGATCAGGCTAGAGCT 

SEQ1803 AATCTTCATTGGAGACCAGGCTAGAGCT 

SEQ1804 GCGGTTGTGTAAAAG-T 

SEQ1805 TTCCGTCAACTTCCAAAATATGAAGTAATCTTCATTGGAGATCAGGCTAGAGCT 

SEQ1806 -GTAATCTTCAT TGGAGATCAGGCTAG AGCT 

SEQ1807 CTTTTGGGCTGGCGGTTGTGTAAAAT-T 

SEQ1808 GTAATCTTCATTGGGGATCAGGCTAGAGCT 

SEQ1809 AAATGTTCCGTCAACTTCCAGAAGAGGAAGTAATCTTCATTGGAGATCAGGCTAGAGCT 

SEQ1810 ATTTCTTTAAAACCTTTTGGGCTGGCGGTTGTGTAATAT-T 

SEQ1811 ATTTCTTTAAAACCTTTTGGGCTGGCGGTTGTGTAATAAGT 

SEQ1812 GGCGGTTGTGTAAAAG-T 

SEQ1813 AATCTTCATTGGAGATCAGGCTAGAGCT 

SEQ1814 TGGGCTGGCGGTTGTGTAAAAG-T 

SEQ1801 CGTATGGTC-CTAGACCTGCTCAACAGATTAGAGAGTTTACCTGGCAGATGGTTAATTT 

SEQ1802 CGTAT GGTC- CTAGACCT GCTCAACAGATT AGAGAGTTTACCTGGCAGAT GGTTAACTT 

SEQ1803 CGTATGGTC-CTAGACCTGCTCAACAGATTAGAGAGTT-ACCTGGCAGATGGTTAATTT 

SEQ1804 — GATGACCACCGTGTTTATTTTGCCAATTATGG — TTTATCTCA-AAATAGTTCA 

SEQ1805 CGTATGGTC -CTAGACCTGCTCAACAGATTAGAG AGTT TACCTGGCAG ATGGTT AACT T 

SEQ1806 CGTATGGTC-CTAGACCTGCTCAACAGATTAGAGAGTTTACCTGGCAGATGGTTAATTT 

SEQ1807 — GATGACCACCGTGTTTATTTTGCCAATTATGG — TTTATCTCA-AAATAGTTCA 

SEQ1808 CGTATGGTC-CTAGACCTGCTCAACAGATTAGAGAGTTTACCTGGCAGATGGTTAATTT 

SEQ1809 CGTATGGTC - CT AGACCTGCTCAACAG AT T AGAGAGTTT ACCTGGCAGATGGTT AACTT 

SEQ1810 — GATGACCACCGTGTTTATTTTGCCAATTATGG — TTTATCTCA-AAATAGTTCA 

SEQ1811 — GATGACCACCGTGTTTATTTTGCCAATTATGG — TTTATCTCA-AAATAGTTCA 

SEQ1812 — GATGACCACCGTGTTTATTTTGCCAATTATGG — TTTATCTCA-AAATAGTTCA 

SEQ1813 CGTATGGTC-CTAGACCTGCTCAACAGATTAGAGAGTTTACCTGGCAGATGGTTAACTT 

SEQ1814 — GATGACCACCGTGTTT AT T TTGCCAAT TAT GG — TTTATCTCA-AAATAGTTCA 
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Table 18: Comparative Sequences relating to SAG1600 (glutamate racemase) 



SEQ1801 
SEQ1802 
SEQ1803 
SEQ1804 
SEQ1805 
SEQ1806 
SEQ1807 
SEQ1808 
SEQ1809 
SEQ1810 
SEQ1811 
SEQ1812 
SEQ1813 
SEQ1814 

SEQ1801 
SEQ1802 
SEQ1803 
SEQ1804 
SEQ1805 
SEQ1806 
SEQ1807 
SEQ1808 
SEQ1809 
SEQ1810 
SEQ1811 
SEQ1812 
SEQ1813 
SEQ1814 

SEQ1801 
SEQ1802 
SEQ1803 
SEQ1804 
SEQ1805 
SEQ1806 
SEQ1807 
SEQ1808 
SEQ1809 
SEQ1810 
SEQ1811 
SEQ1812 
SEQ1813 
SEQ1814 

SEQ1801 
SEQ1802 
SEQ1803 
SEQ1804 
SEQ1805 
SEQ1806 
SEQ1807 
SEQ1808 
SEQ1809 
SEQ1810 
SEQ1811 
SEQ1812 
SEQ1813 
SEQ1814 



TTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATAGAGCAACTGCAGTTGC 

TTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTTGC 

TTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTTGC 

— ATAAAACAGAAATATCACGAACGGT-TTCTGCGCCACTATCAATTAATTTAACCTCA 

TTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTTGC 

TT ATTGACTAAAAATGTTAAGATGAT TGTTAT AGCTTGTAATACAGCAACTGCAGTT GC 

— ATAAAACAGAAATATCACGAACGGT-TTCTGCGCCACTATCAATTAATTTAACCTCA 

TTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTT — 

TTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTTGC 

— ATAAAACAGAAATATCACGAACGGT -TTCTGCGCCACTATCAAT T AATTTAACCTCA 

— ATAAAACAGAAATATCACGAACGGT -TTCTGCGCCACTATCAATTAATT TAACCTCA 

— ATAAAACAGAAATATCACGAACGGT-TTCTGCGCCACTATCAATTAATTTAACCTCA 

TTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGC 

■ — ATAAAACAGAAATATCACGAACGGT-TTCTGCGCCACTATCAATTAATTTAACCTCA 

* 

TGGCAAGAAATTAAAGAAAAACTAGACGTGCCTGTTTTAGGCGTTATTTTACCAGGAGC 
TGGCAAGAAATTAAAGAAAAACTAGACATCCCTGTTTTAGGCGTTATTTTACCAGGAGC 
TGGCAAGAAATT AAAGAAAAACTAGACAT ACCTGTTT TAGGCGTTAT TTT ACCAGGAGC 
CCCCCATAACATTTTGAATGATGGGACGTAATAGGGGATAATGC-GTGCAACCTAAAAT 
TGGCAAGAAAT TAAAGAAAAACTAGACATCCCTGTTT TAGGCGTTAT T TTACCAGGAGC 

TGGCAAGAAAT TAAAGAAAAACTAGACAT AC 

CCCCCATAACAT TT TGAATAATGGGACGTAATAGGGGATAATGC— GTGCAACCTAAAAT 

TGGCAAGAAATTAAAGAAAAACTAGACATCCCTGTTTTAGGCGTTATTTTACCAGGAGC 
CCCCCAT AACATTT TGAATGATGGGACGTAATATGGGATAATGC - GTGCAACCTAAAAT 
CCCCCATAACATTTTGAATGATGGGACGTAATAGGGGATAATGC-GTGCAACCTAAAAT 
CCCCCATAACATTTTGAATGATGGGACGTAATAGGGGATAATGC-GTGCAACCTAAAAT 

CCCCCAT AACATTT TGAATGAT GGGACGTAAT AAGGGATAATG C- GTGCAACCTAAAAT 

AGCGCAGCTATCAAATCAACTAATTCAGGGAAAGTTGGTATTATAGGTACTCCCATGAC 
AGCGCAGCTATCAAATCAACTAATTTAGGGAAAGTTGGTATTATAGGTACTCCCATGAC 
AGCGCAGCTATCAAATCAACTAATTCAGGGAAAGTTGGTATTATAGGTACTCCCATGAC 
AAAGTATCTAATTTACCAACTAATGGGGACAACGTTTCATAAACCACCTTTTTGGCTAA 
AGCGCAGCTATCAAATCAACTAATTTAGGGAAAGTTGGTATTATAGGTACTCCCATGAC 

AAAGTATCTAATTTACCAACTAATGGGGACAATGTTTCATAAACCACCTTTTTGGCTAA 

AGCGCAGCTATCAAATCAACTAAT T TAGGGAAAGTTGGTATTAT AGGT ACTCCCATGAC 

AAAGTA 

AAAGTATC T AATTTACCAACTAATGGGGACAACGTTTCATAAACCACCT T TTTGGCTAA 
AAAGTATCTAATTTACCAACTAATGGGGACAACGTTTCATAAACCACCTTTTTGGCTAA 

AAAGTATCTAATTTACCAACTAATGGGGACAACGTTTCATAAACCACCTTTTTGGCTAA 

GTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCAAATACTGCTGTGGT 
GTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCAAATACTGCTGTGGT 
GTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCAAATACTGCTGTGGT 
CTAGAAGACATCTGATTTGATTCCACAATTGGAACAAATTTCGGACAAGCAAGGGATAC 
GTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCAAATACTGCTGTGGT 

CTAGAAGACATCTGATTTGATTCCACAATTGGAACA7\ATTTCGGACAAGCAAGGGATAC 

GTTAAATCAGATGCTTATCGTCAAAAAATTCAAGC 

CT AGAAGACATCTGAT T TGAT TCCACAATTGG AACAA 

CTAGAAGA 

CTAGAAG ACATCTGATT TGATTCCACAATTGGAACAAAT TTCGGACAAGCAAGGGAT AC 
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Table 18: Comparative Sequences relating to SAG 1600 (glutamate racemase) 



SEQieOl 
SEQ1802 
SEQ1803 
SEQ1804 
SEQ1805 
SEQ1806 
SEQ1807 
SEQ1808 
SEQ1809 
SEQ1810 
SEQ1811 
SEQ1812 
SEQ1813 
SEQ1814 

SEQ1801 
SEQ1802 
SEQ1803 
SEQ1804 
SEQ1805 
SEQ1806 
SEQ1807 
SEQ1808 
SEQ1809 
SEQ1810 
SEQ1811 
SEQ1812 
SEQ1813 
SEQ1814 

SEQ1801 
SEQ1802 
SEQ1803 
SEQ1804 
SEQ1805 
SEQ1806 
SEQ1807 
SEQ1808 
SEQ1809 
SEQ1810 
SEQ1811 
SEQ1812 
SEQ1813 
SEQ1814 

SEQ1801 
SEQ1802 
SEQ1803 
SEQ1804 
SEQ1805 
SEQ1806 
SEQ1807 
SEQ1808 
SEQ1809 
SEQ1810 
SEQ1811 
SEQ1812 
SEQ1813 
SEQX814 



TCCCTTGCTTGTCCGAAATTTGTTCCAATTGTGGAATCAAATCAGATGTCTTCTAGTTT 
TCCCTTGCTTGTCCGAAATTTGTTCCAATTGTGGAATCAAATCAGATGTCTTCTAGTTT 
TCCCTTGCTTGTCCGAAATTTGTTCCAATTGTGGAATCAAATCAGATGTCTTCTAGTTT 
ACAGCAGTATTTGGAGACAAAGCTTGAATTTTTTGACGATAAGCATCTGATTTAACAGT 
TCCCTTGCTTGTCCGAAAT 



ACAGCAGTATTTGGAGACAAAGCTTGAATTTTTTGACGATAAGCATCTGATTTAACAGT 



ACAGCAGTATTTGGAGACAAAGCTTGAATTTTTTGACGATAAGCATCTGATTTAACAGT 

GCCAAAAAGGTGGTTTATGAAACGTTGTCCCCATTAGTTGGTAAATTAGATACTTTAAT 
GCCAAAAAGGTGGTTTATGAAACGTTGTCCCCATTAGTTGGTAAATTAGATACTTTAAT 
GCCAAAAAGGTGGTTTATGAAACGCTGTCCCCATTAGTTGGTAAATTAGATACTTTAAT 
ATGGGAGTACCTATAATACCAACTTTCCCTAAAT TAGTTGAT TTGATAGCTGCGCTAGC 



ATGGGAGTACCTATAA- 



ATGGGAGTACCTATAATACCAACTTTCCCTGAATABCMARATVSTNCSRATNGTSAGGT 

TTAGGTTGCACGCATTATCCCTTATTACGTCCCATCATTCAAAATGTTATGGGGGCTGA 
TTAGGTTGCACGCATTATCCCCTATTACGTCCCATCATTCAAAATGTTATGGGGGCTGA 
TTAGGTTGCACGCATTATCCCTTATTACGTCCCATCATTCAAAATGTTATGGGGGCTGA 
CCTGGTAAAATAACGCCTAAAACAGGGATGTCTAGTTTTTCTTTAATTTCTTGCCAGGC 



AMATRACMAS 



GTTAAATTAATTGATAGTGGCGCAGAJVACCGTTCGTGATATTTCTGTTTTATTGAACTA 
GTTAAATTAATTGATAGTGGCGCAGAAACCGTTCGTGATATTTCTGTTTTATTGAACTA 
GTTAAATTAATTGATAGTGGCGCAGAAACCGTTCGTGATATTTCTGTTTTATTGAACTA 
ACTGCAGTTGCTGTATTACAAGCTATAACAATCATCTTAACATTTTTAGTCAATAAGAA 

i 
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Table 18: Comparativ Sequences relating to SAG1600 (glutamate racemase) 



SEQ1801 
SEQ1802 
SEQ1803 
SEQ1804 
SEQ1805 
SEQ1806 
SEQ1807 
SEQ1808 
SEQ1809 
SEQ1810 
SEQ1811 
SEQ1812 
SEQ1813 
SEQ1814 

SEQ1801 
SEQ1802 
SEQ1803 
SEQ1804 
SEQ1805 
SEQ1806 
SEQ1807 
SEQ1808 
SEQ1809 
SEQ1810 
SEQ1811 
SEQ1812 
SEQ1813 
SEQ1814 

SEQ1801 
SEQ1802 
SEQ1803 
SEQ1804 
SEQ1805 
SEQ1806 
SEQ1807 
SEQ1808 
SEQ1809 
SEQ1810 
SEQ1811 
SEQ1812 
SEQ1813 
SEQ1814 



TTTGAGATAT^ACCATAATTGGCAAAATAAACACGGTGGTCATCACTTTTACACAACCGC 
TTTGAGATAAACCATAATTGGCAAAATAAACACGGTGGTCATCACTTTTACACAACCGC 
TTTGAGATAAMCCATAATTGGSM7VAATAAACACGGTGGTCATCACTTTTACACAACCGS 
TTAACCATCTGCCAGGTAAACTCTCTAATCTGTTGAGCAGGTCTAGGACCATACGGAGC 



AGCCCAA 

AGCCCAAAAGGT TT TAAAGAAA . 

AGCCCA2\AAGGTTTTTAAGGAAATTGCAGAACAATGGCTTAATCAAGAAATAAAT 

CTAGCCTGATCTCCAATGAAGATTACTTCCTCTTCTGGAAGTTGACGGAACATTTCCTT 



ACAACCGTTAAACCACCT 



i 
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^pable 19: C mparative Sequences relating to SAF1680 (shikimate 5-dehydrogenase) 

SEQ ID NO. 1901: SAG1680 FROM THE 2603 V/R GBS TYPE V STRAIN 

ATCCCTAGACCATTATAAGCATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAAC 
CAAGTCGACAACTACTAAATTCGGTGTTAAAATTTCTGGATCGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTA 
AACTAGTAGCATCAATATTUVAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACT 
ACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCTGTTACGATTAAATAA 
TCTAATTTCCGCAACTCCCTCCATAGCTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCAGCTATTGTAATTA 
TTTTATTTTTAGCACTGAAACCTTGAGCTGCTAAAGCTTTAAAACAACCAATGCCATCTGTCATATGGCCTACTAAACGT 
CCGGTTCCACCTTGATTAACGATAGTATTTACAGCACCCACTAATTTAGCTTGAGGAGATAAATCATCTAGCAAAGGGAT 
AACACTCTGTTTAAATGGCATTGAAACATTAACACCACGAATACCCAATGCCCTGACACCTCGAACAGCTTCTGTTAATT 
TACCCTCTTCTACTTCAAATGTCAGATAGGCATAATTCATGTTTTTTTCTTGAAAAGAGGTATTCCACATTAACGGGGAT 
AGAGAGTGGCGTGCAGG 

SEQ ID NO* 1902: SAG1680 FROM THE H36b GBS TYPE lb STRAIN 

GTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCAGCATCCCTAGACCATTATAAGCATGTTTCACTCCATTTTGTCT 
AACAAATCGTAACAATGCTGTTtCTTTAGGCTTGTAAACCAAGTCGACAACTACTAAATTCGGTGTTAAAATTTCTGGAT 
CGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCATCAATATAAAAATGACTAGTTCTAATAGCG 
TCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTC 
AATGACCTTATCGTAATTTGAGCTGTTACGATTA7VATAATCTAATTTCCGCAACTCCCTCCATAGCTGCTTGAACTGCAA 
CTGCTTTACCTGAACCACCAATACCAGCTATTGTAATTATTTTATTTTTAGCACTGAAACCTTGAGCTGCTAAAGCTTTA 
AAACAACCAATGCCATCTGTCATATGGCCTACTAAACGTCCGGTTCCACCTTGATTAACGATAGTATTTACAGCACCCAC 
TAATTTAGCTTGAGGAGATAAATCATCTAGCAAAGGGATAACACTCTGTTTAAATGGCATTGAAACATTAACACCACGAA 
TACCCAATGCCCTGACACCTCGAACAGCTTCTGTTAATTTACCCTCTTCTACTTCAAATGTCAGATAGGCATAATTCATG 
TTTTTTTCTTGAAAAGAGGTATTCCACATTAACGGGGATAGAGAGTGGCGTGCAGGA 

SEQ ID NO. 1903: SAG1680 FROM THE M732 GBS TYPE III STRAIN 

CTGGTCTAATTGCCAATCCTGCACGCCACTCTCTATCCCCGTTAATGTGGAATACCTCTTTTCAAGATUU^AJUICATGAAT 
TATGCCTATCTGACATTTGAAGTAGAAGAGGGTAAATTAACAGAAGCTGTTCGAGGTGTCAGGGCATTGAGTATTCGTGG 
TGTTAATGTTTCAATGCCATTTAAACAGAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTG 
CTGTAAATACTATCGTTAATCAAGGTGGAACCGGACGTTTAGTAGGCCATATGACAGATGGCATTGGTTGTTTTAAAGCT 
TTAGCAGCTCAAGGTTTCAGTGCTAAAAATAAAATAATTACAATAGCTGGTATTGGTGGTTCAGGTAAAGCAGTTGCAGT 
TCAAGCAGCTATGGAGGGAGTTGCGGAAATTAGATTATTTAATCGTAACAGCTCAAATTACGATAAGGTCATTGACTTAT 
CAGAT AAAAT TAA/^AAACAGTTTCAAATAAAGGTAGTCGTTGATT ATCTAGAAAATAAGACAGCAT TTAAAGACGCTATT 
AGAACTAGTCATTTTTAT AT TGATGCTACTAGT TT AGGAATGAGGCCAT T AGAT AATTATAGTTTAATTAACG ATCCAGA 
TATTTTAACACCGAATTTAGTAGTTGTCGACTT 

SEQ ID NO. 1904: SAG1680 FROM THE M781 GBS TYPE III STRAIN 

AAATCAGCATCCCTAGACATTATAAGCATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTCTTTAGGCT 
TGTAAACCAAGTCGACAACTACTAAATTCGGTGTTAAAATTTCTGGATCGTTAATTAAACTATAATTATCTAATGGCCTC 
ATTCCTAAACT AGTAGCATCAATATAAAAATGACT AGTTCTAATAGCGT C TT T AAATGCTGTCT TAT T TTCTAGATAATC 
AACGACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCTGTTACGAT 
TAAATAATCTAATTTCCGCAACTCCCTCCATAGCTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCAGCTATT 
GTAATTATTTTATTTTTAGCACTGAAACCTTGAGCTGCTAAAGCTTTAAAACAACCAATGCCATCTGTCATATGGCCTAC 
TAAACGTCCGGTTCCACCTTGATTAACGATAGTATTTACAGCACCCACTAATTTAGCTTGAGGAGATAAATCATCTAGCA 
AAGGGATAACACTCTGTTTAAATGGCATTGAAACATTAACACCACGAATACTCAATGCCCTGACACCTCGAACAGCTTCT 
GTTAATTTACCCTCTTCTACTTCAAATGTCAGATAGGCATAATTCATGTTTTTTTCTTGAAAAGAGGTATTCCACATTAA 
CGGGGATAGAGAGTGGCGTGCA 

SEQ ID NO. 1905: SAG1680 FROM THE 090 GBS TYPE la STRAIN 

GTTCGAGGTGTCAGGGCATTGGGTATTCGTGGTGTTAATGTTTCAATGCCATTTAAACAGAGTGTTATCCCtTTGCTArA 
TGATTTATCTCCTCAAGCTAAATTAGTGGGTGCTGTAAATACTATCGTTAATCAAGGTGGAACCGsACGTTTAGTAGGCC 
ATATGACAGATGGCATTGGTTGTTTT7VAAGCTTTAGCAGCTCAAGGTTTCAGTGCTAAAAATAAAATAGTTACAATAGCT 
GGTATTGGTGGTTCAGGTAAAGCAGTTGCAGTTCAAGCAGCTATGGAGGGAGTTGCGGA7^ATTAGATTATTTAATCGTAA 
TAGCTCAAATTACGATAAGGTCATTGACTTATCAGATAAAATTAAAAAACAGTTTCAAATAAAGGTAGTCGTTGATTATC 
TAGAAAATAAGACAGCATTTAAAGACGCTATTAGAACTAGTCATTTTTATATTGATGCTACTAGTTTAGGAATGArGCCA 
TTAGATAATTATAGTTTAATTAACGATCCAGAAATTTTAACACCCAATTTAGTAGTTGTCGACTTGGTTTACAAGCCTAA 
AGAAACAGCATTGTTACGATTTGTTAGACAAAATGGAGTGAAACATGCTTATAATGGTCTAGGGATGCTGATTTATCAAG 
GAGCAGA 
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^^able 19: Comparative Sequences relating to SAF1680 (shikimate 5-dehydr genase) 

SEQ ID NO. 1906: SA61680 FROM THE A909 GBS TYPE la STRAIN 

CCCTAGACCATTATAATCATGTTTCACTCCATTTTGTCTAACAAATCGT^CAATGCTGTTTCTTTAGGCTTGTAAACCA 
AGTCGACAACTACTAAATTCGGTGTTAAAATTTCTGGATCGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTAT^A 
CTAGTAGCATCAATATAAA7VATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTAC 
CTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCTGTTACGATTAAATAATC 
TAATTTCCGCAACTCCCTCCATAGCTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCAGCTATTGTAATTATT 
TTATTTTTAGCACTGAAACCTTGAGCTGCTAAAGCTTTAAAACAACCAATGCCATCTGTCATATGGCCTACTAAACGTCC 
GGTTCCACCTTGATTAACGATAGTATTTACAGCACCCACTAATTTAGCTTGAGGAGATAAATCATCTAGCA71AGGGATAA 
CACTCTGTTTAAATGGCATTGAAACATTAACACCACGAATACCCAATGCCCTGACACCTCGAACAGCTTCTGTTAATTTA 
CCCTCTTCTACTTCAAATGTCAGATAGGCATAATTCATGTTTTTTTCTTGAAAAGAGGTATTCCACATTAACGGGGATAG 

SEQ ID NO. 1907: SA61680 FROM THE COH1 GBS TYPE la STRAIN 

TGCACGCCACTCTCTATCCCCGTTAATGTGGAATACCTCTTTTAAGAAAAAAACATGAATTATGCCTATCTGACATTTGA 
AGTAGAAGAGGGTAAATTAACAGAAGCTGTTCGAGGTGTCAGGGCATTGAGTATTCGTGGTGTTAATGTTTCAATGCCAT 
TTAAACAGAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTGCTGTAAATACT 

SEQ ID NO. 1908: SA61680 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

ATTCGTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCAGCATCCCTAGACCATTATAAGCATGTTTCACTCCATTTT 
GTCTAACAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTCGACAACTACTAAATTGGGTGTTAAAATTTCT 
GGATCGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCATCAATATAAAAATGACTAGTTCTAAT 
AGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATA 
AGTCAATGACCTTATCGTAATTTGAGCTATTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATAACTGCTTGAACT 
GCAACTGCTTTACCTGAACCACCAATACCAGCTATTGTAACTATTTT 

SEQ ID NO. 1909: SAG1680 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

ACTCTCTATCCCCGTTAATGTGGAATACCTCTTTTCAAGAAAAAAACATGAATTATGCCTATCTGACATTTGAAGTAGAA 
GAGGGTAAATTAACAGAAGCTGTTCGAGGTGTCAGGGCATTGGGTATTCGTGGTGTTAATGTTTCAATGCCATTTAAACA 
GAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCT^AGCTAAATTAGTGGGTGCTGTAAATACTATCGTTAATCAAGGTG 
GAACCGGACGTTTAGTAGGCCATATGACAGATGGCATTGGTTGTTTTAAAGCTTTAGCAGCTCAAGGTTTCAGTGCTAAA 
AATAAAATAGTTACAATAGCTGGTATTGGTG 

SEQ ID NO. 1910: SAG1680 FROM THE 1169NT1 GBS TYPE V STRAIN 

ATTCGTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCAGCATCCCTAGACCATTATAAGCATGTTTCACTCCATTTT 
GTCTAACAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTCGACAACTACTAAATTCGGTGTTAAAATTTCT 
GGATCGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCATCAATATAAAAATGACTAGTTCTAAT 
AGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATA 
AGTCAATGACCTTATCGTAATTTGAGCTGTTACGAT 

SEQ ID NO. 1911: SAG1680 FROM THE 1169NT1 GBS TYPE V STRAIN 

ACTTCTCTATTCCCCGTTAATGTGGAATACCTCTTTTCAAGAAAAAAACATGAATTATGCCTATCTGACATTTGAAGTAG 
AAGAGGGTAAATTAACAGAAGCTGTTCGAGGTGTCAGGGCATTGGGTATTCGTGGTGTTAATGTTTCAATGCCATTTAAA 
CAGAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTGCTGTAAATACTATCGTTAATCAAGG 
TGGAACC 

SEQ ID NO. 1912: SAG1680 FROM THE 18RS21 GBS TYPE II STRAIN 

TCGTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCATCATCCCTAGACCATTATAAGCATGTTTCACTCCATTTTGT 
CTAACAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTCGACAACTACTAAATTCGGTGTTAAAATTTCTGG 
ATCGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCATCAATATAAAAATGACTAGTTCTAATAG 
CGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAG 
TCAATGACCTTATCGTAATTTGAGCTGTTACGATTAAATAATCTAATTTCCGCAAC 

SEQ ID NO. 1913: SA61680 FROM THE 18RS21 GBS TYPE II STRAIN 

ATGCCTATCTGACATTTGAAGTAGAAGAGGGTAAATTAACAGAAGCTGTTCGAGGTGTCAGGGCATTGGGTATTCGTGGT 
GTTAATGTTTCAATGCCATTTAAACAGAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTGC 
TGTAAATACTATCGTTAATCAAGGTGGAACCGGACGTTTAGTAGGCCATATGACAGATGGCATTGGTTGTTTTAAAGCTT 
TAGCAGCTCAAGGTTTCAGTGCTAAAAATAAAATAATTACAATAGCTGGTATTGGTGGTTCAGGTAAAGCAGTTGCAGTT 
CAAGCAGCTATGGAGGGAGTTGCGG 
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^^able 19: C mparative Sequences relating t SAF1680 (shikimate 5-dehydrogenase) 

SEQ ID NO. 1914: SAG1680 FROM THE JM9130013 GBS TYPE VTII STRAIN 

CCCTAGACCATTATAAGTCATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACC 
AAGTCGACAACTACTAAATTGGGTGTTAAAATTTCTGGATCGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTAA 
ACTAGTAGCATCAATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTA 
CCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCTATTACGATTAAATAAT 
CTAATTTCCGCAACTCCCTCCATAGCTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCAGCTATTGTAACTAT 
TTTATTTTTAGCACTGAAACCTTGAGCTGCT7\AAGCTTT7VAAACAACCAATGCCATCTGTCAT 

SEQ1901 ATCCCT 

SEQ1 902 GTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCAGCATCCCT 

SEQ1 903 TGGTCTAATTGCCAATCCTGCACGCCACTCTCTAT-CCCCGTTAATGTGGAATACCTCT ' 

SEQ1 904 AAATCAGCATCCCT 

SEQ1905 

SEQ1906 CCCT 

SEQ1907 TGCACGCCACTCTCTAT-CCCCGTTAATGTGGAATACCTCT 

SEQ1908 ATTCGTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCAGCATCCCT 

SEQ1909 ACT CTCT AT - CCCCGT T AATGTGGAAT ACCTCT 

SEQ1910 ATTCGTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCAGCATCCCT 

SEQ1911 ACT TCTCTATTCCCCGTTAATGTGGAAT ACCTCT 

SEQ1912 TCGTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCATCATCCCT 

SEQ1913 

SEQ1914 CCCT 

SEQ1901 GACCATTATAAG-CATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTC 

SEQ1902 GACCATTATAAG-CATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTC 

SEQ1 903 TTCAAGAAAAAAACATGAATTATGCCTATCTGACATTTGAAGTAGAAGAGGGTAAATT A 

SEQ1904 GAC-ATTATAAG-CATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTC 

SEQ1905 

SEQ1906 GACCATTATAAT-CATGTTTCACTCCATTTTGTCTAACAAATCGTMCAATGCTGTTTC 

SEQ1907 TT - AAGAAAAAAACATGAATTATGCCT ATCTGACATT TG AAGTAGAAGAGGGTAAATTA 

SEQ1908 GACCATTATAAG-CATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTC 

SEQ1909 TTCAAGAAAAAAACATGAATTATGCCTATCTGACATTTGAAGTAGAAGAGGGTAAATTA 

SEQ1910 GACCATTATAAG-CATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTC 

SEQ1911 TTCAAGAAAAAAACATGAATTATGCCTATCTGACATTTGAAGTAGAAGAGGGTAAATTA 

SEQ1912 GACCATTATAAG-CATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTC 

SEQ1913 ATGCCTATCTGACATTTGAAGTAGAAGAGGGTAAATTA 

SEQ1914 GACCATTATAAGTCATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTC 

SEQ1901 TTAGGCTTGTAAACCAAGTC — GACAACTACTAAATTCGGTGTTAAAATTTCTGGATCG 

SEQ1902 TTAGGCTTGTAAACCAAGTC — GACAACTACTAAATTCGGTGTTAAAATTTCTGGATCG 

SEQ1903 CAGAAGCTGTTCGAGGTGTCAGGGCATTGAGTATTCGTGGTGTTAATGTTTCAATGCCA 

SEQ1904 TTAGGCTTGTAAACCAAGTC — GACAACTACTAAATTCGGTGTTAAAATTTCTGGATCG 

SEQ1905 GTTCGAGGTGTCAGGGCATTGGGTATTCGTGGTGTTAATGTTTCAATGCCA 

SEQ1906 TTAGGCTTGTAAACCAAGTC — GACAACTACTAAATTCGGTGTTAAAATTTCTGGATCG 

SEQ1907 CAGAAGCTGTTCGAGGTGTCAGGGCATTGAGTATTCGTGGTGTTAATGTTTCAATGCCA 

SEQ1908 TTAGGCTTGTAAACCAAGTC— GACAACTACTAAATTGGGTGTTAAAATTTCTGGATCG 

SEQ1909 CAGAAGCTGTTCGAGGTGTCAGGGCATTGGGTATTCGTGGTGTTAATGTTTCAATGCCA 

SEQ1910 TTAGGCTTGTAAACCAAGTC— -GACAACTACTAAATTCGGTGTTAAAATTTCTGGATCG 

SEQ1 91 1 CAGAAGCTGTTCGAGGTGTCAGGGCATTGGGTATT CGTGGTGTT AATGTTTCAATGCCA 

SEQ1912 TTAGGCTTGTAAACCAAGTC — GACAACTACTAAATTCGGTGTTAAAATTTCTGGATCG 

SEQ1913 CAGAAGCTGTTCGAGGTGTCAGGGCATTGGGTATTCGTGGTGTTAATGTTTCAATGCCA 

SEQ1914 TTAGGCTTGTAAACCAAGTC — GACAACTACTAAATTGGGTGTTAAAATTTCTGGATCG 
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able 19: Comparative Sequences relating t SAF1680 (shikimate 5-dehydrogenase) 



SEQ1901 
SEQ1902 
SEQ1903 
SEQ1904 
SEQ1905 
SEQ1906 
SEQ1907 
SEQ1908 
SEQ1909 
SEQ1910 
SEQ1911 
SEQ1912 
SEQ1913 
SEQ1914 

SEQ1901 
SEQ1902 
SEQ1903 
SEQ1904 
SEQ1905 
SEQ1906 
SEQ1907 
SEQ1908 
SEQ1909 
SEQ1910 
SEQ1911 
SEQ1912 
SEQ1913 
SEQ1914 

SEQ1901 
SEQ1902 
SEQ1903 
SEQ1904 
SEQ1905 
SEQ1906 
SEQ1907 
SEQ1908 
SEQ1909 
SEQ1910 
SEQ1911 
SEQ1912 
SEQ1913 
SEQ1914 

SEQ1901 
SEQ1902 
SEQ1903 
SEQ1904 
SEQ1905 
SEQ1906 
SEQ1907 
SEQ1908 
SEQ1909 
SEQ1910 
.SEQ1911 
SEQ1912 
SEQ1913 
SEQ1914 



TT-AATTAAACTATAATTATCT AATGGCCTCAT T CCT-AAACTAGTAGCATCAAT 

TT-AATTAAACTATAATTATCT AATGGCCTCATTCCT-AAACTAGTAGCATCAAT 

TTTAAACAGAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGT 

TT-AATTAAACTATAATTATCT AATGGCCTCATTCCT-AAACTAGTAGCATCAAT 

TTTAAACAGAGTGTTATCCCTTTGCTARATGATTTATCTCCTCAAGCTAAATTAGTGGGT 

TT-AATTAAACTATAATTATCT AATGGCCTCATTCCT-AAACTAGTAGCATCAAT 

TTTAAACAGAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGT 

TT-AATTAAACTATAATTATCT AATGGCCTCATTCCT-AAACTAGTAGCATCAAT 

TTTAAACAGAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGT 

TT-AATTAAACTATAATTATCT AATGGCCTCATTCCT-AAACTAGTAGCATCAAT 

TTTAAACAGAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGT 

TT-AATTAAACTATAATTATCT AATGGCCTCAT TCCT-AAACTAGTAGCATCAAT 

TTTAAACAGAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGT 
TT-AATTAAACTATAATTATCT — . AATGGCCTCATTCCT-AAACTAGTAGCATCAAT 

TAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAAC 
TAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAAC 
CTGTAAAT ACTATCGTTAATCAAGGTGGAACCGGACGT T TAGTAGGCCATATGAC AGAT 
TAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAAC 
CTGTAAAT ACT AT CGTT AATC AAGGTGGAACCGS ACGTT TAGT AGGCCATATGACAG AT 
T AAAAATG ACTAGT TCTAAT AGCGTCTT T AAATGCTGTCT TATTTTCTAGATAATCAAC 

CTGTAAATACT 

TAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAAC 
CTGTAAATACTATCGTTAATCAAGGTGGAACCGGACGTTTAGTAGGCCATATGACAGAT 
TAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAAC 

CTGTAAATACTATCGTTAATCAAGGTGGAACC 

TAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATGAAC 
CTGTAAATACTATCGTTAATCAAGGTGGAACCGGACGTTTAGTAGGCCATATGACAGAT 
TAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAAC 

ACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTA 
ACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTA 
GCATTGGTTGTTTTAAAGCTTTAGCAGCTCAAGGTTTCAGTGCTAAAAATAAAATAATT 
ACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTA 
GCATTGGTTGTTTTAAAGCTTTAGCAGCTCAAGGTTTCAGTGCTAAAAATAAAATAGTT 
ACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTA 

ACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTA 
GCATTGGTTGTTTTAAAGCTTTAGCAGCTCAAGGTTTCAGTGCTAAAAATAAAATAGTT 
ACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTA 

ACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTA 
G CATTGGTTGT TT TAAAGCT T T AGCAGCTCAAGGTT TC AGTGCT AAAAATAAAATAATT 
ACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTA 

TTTGAGCTGTTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATAGCTGCTTGAAC 
TTTGAGCTGTTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATAGCTGCTTGAAC 
CAATAGCTGGTATTGGTGGTTCAGGTAAAGCAGTTGCAGTTCAAGCAGCTATGGAGGGA 
TTTGAGCTGTTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATAGCTGCTTGAAC 
CAATAGCTGGTATTGGTGGTTCAGGTAAAGCAGTTGCAGTTCAAGCAGCTATGGAGGGA 
TTTGAGCTGTTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATAGCTGCTTGAAC 

TTTGAGCTATTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATAACTGCTTGAAC 

CAATAGCTGGTATTGGTG 

TTTGAGCTGTTACGAT 

TTTGAGCTGTTACGATTAAATAATCTAATTTCCGCAAC 

CAAT AGCTGGTATTGGTGGTTCAGGTAAAGC AGTTGCAG T TCAAGCAGCT ATGGAGGGA 
TTTGAGCTATTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATAGCTGCTTGAAC 
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ble 19: Comparative Sequences relating t SAF1680 (shikimate 5-dehydrogenase) 



SEQ1901 
SEQ1902 
SEQ1903 
SEQ1904 
SEQ1905 
SEQ1906 
SEQ1907 
SEQ1908 
SEQ1909 
SEQ1910 
SEQ1911 
SEQ1912 
SEQ1913 
SEQ1914 

SEQ1901 
SEQ1902 
SEQ1903 
SEQ1904 
SEQ1905 
SEQ1906 
SEQ1907 
SEQ1908 
SEQ1909 
SEQ1910 
SEQ1911 
SEQ1912 
SEQ1913 
SEQ1914 

SEQ1901 
SEQ1902 
SEQ1903 
SEQ1904 
SEQ1905 
SEQ1906 
SEQ1907 
SEQ1908 
SEQ1909 
SEQ1910 
SEQ1911 
SEQ1912 
SEQ1913 
SEQ1914 

SEQ1901 
SEQ1902 
SEQ1903 
SEQ1904 
SEQ1905 
SEQ1906 
SEQ1907 
SEQ1908 
SEQ1909 
SEQ1910 
SEQ1911 
SEQ1912 
SEQ1913 
SEQ1914 



GCAACTGCTTTACCTGAACCACCAATACCAGCT ATTGT AAT TATTTTATTT TTAGCACT 
GCAACTGCTTTACCTGAACCACCAATACCAGCTATTGTAATTATTTTATTTTTAGCACT 
TTGCGGAAATTAGATTATTTAATCGTAACAGCTCAAATTACGATAAGGTCATTGACTTA 
GCAACTGCTT T ACCTGAACCACCAATACCAGCTATTGT AAT TATTTTAT TT TTAGCACT 
TTGCGGAAAT T AGATT ATTTAATCGTAAT AGCTCAAATTACGATAAGGTCAT TGACTT A 
GCAACTGCTTTACCTGAACCACCAATACCAGCTATTGTAATTATTTTATTTTTAGCACT 

GCAACTGCTTTACCTGAACCACCAATACCAGCTATTGTAACTATTTT 

TTGCGG 

GCAACTGCTTTACCTGAACCACCAATACCAGCT ATTGT AACTATT TT AT TTTTAGCACT 

AAACCTTGAGCTGCTAAAGCTTTAAT^ACAACCAATGCCATCTGTCATATGGCCTACTAA 
AAACCTTGAGCTGCTAAAGCTTTAAAACAACCAATGCCATCTGTCATATGGCCTACTAA 
CAGATAAAATTAAAAAACAGTTTCAAATAAAGGTAGTCGTTGATTATCTAGAAAATAAG 
AAACCTTGAGCTGCTAAAGCTTTAAAACAACCAATGCCATCTGTCATATGGCCTACTAA 
CAGATAAAATTAAAAAACAGTTTCAAATAAAGGTAGTCGTTGATTATCTAGAAAATAAG 
AAACCTTGAGCTGCTAAAGCTTTAAAACAACCAATGCCATCTGTCATATGGCCTACTAA 



AAACCTTGAGCTGCTAAAGCTTTAAAACAACCAATGCCATCTGTCAT-TABCMARAT — 

CGTCCGGTTCCACCTTGATTAACGATAGTATTTACAGCACCCACTAATTTAGCTTGAGG 
CGTCCGGTTCCACCTTGATTAACGATAGTATTTACAGCACCCACTAATTTAGCTTGAGG 
CAGCATT TAAAGACGCT ATT AGAACT AGTC ATT T T TATATTGATGCTACT AGTTTAGGA 
CGTCCGGTTCCACCTTGATTAACGATAGTATTTACAGCACCCACTAATTTAGCTTGAGG 
CAGCATT TAAAGACGCT ATT AGAACTAGTCATT T TTAT AT TGATGCTACTAGTTTAGG A 
CGTCCGGTTCCACCTTGATTAACGAT AGTATTTACAGCACCCACTAAT T TAGCTTGAGG 



STNCSRATNGTSASHKMATDHYDRGNAS- 



GATAAATCATCT AGCAAAGGGATAACACTCTGTTT AAAT GGCATTGAAACAT TAACACC 
GATAAATCATCTAGCAAAGGGATAACACTCTGTTTAAATGGCATTGAAACATTAACACC 
TGAGGCCATTAGATAATTATAGTTTAATTAACGATCCAGATATTTTAACACCGAATTTA 
GAT AAATCATCT AGCAAAGGGATAACACTCTGTTTAAATGGCAT T GAAACATTAACACC 
TGARGCCAT TAGATAATT AT AGTT TAATTAACGATCCAGAAATTTTAACACCCAATTTA 
GATAAATCATCTAGCAAAGGGATAACACTCTGTTTAAATGGCATTGAAACATTAACACC 
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able 19: Comparative Sequences relating to SAF1680 (shikimat ^S-dehydr genase) 



SEQ1901 
SEQ1902 
SEQ1903 
SEQ1904 
SEQ1905 
SEQ1906 
SEQ1907 
SEQ1908 
SEQ1909 
SEQ1910 
SEQ1911 
SEQ1912 
SEQ1913 
SEQ1914 

SEQ1901 
SEQ1902 
SEQ1903 
SEQ1904 
SEQ1905 
SEQ1906 
SEQ1907 
SEQ1908 
SEQ1909 
SEQ1910 
SEQ1911 
SEQ19X2 
SEQ1913 
SEQ1914 

SEQ1901 
SEQ1902 
SEQ1903 
SEQ1904 
SEQ1905 
SEQ1906 
SEQ1907 
SEQ1908 
SEQ1909 
SEQ1910 
SEQ1911 
SEQ1912 
SEQ1913 
SEQ1914 



CGAATACCCAATGCCCTGACACCTCGAACAGCTTCTGTTAATTTACCCTCTTCTACTTC 
CGAATACCCAATGCCCTGACACCTCGAACAGCTTCTGTTAATTTACCCTCTTCTACTTC 

TAGTTGTCGACTT 

CGAATACTCAATGCCCTGACACCTCGAACAGCTTCTGTTAATTTACCCTCTTCTACTTC 
TAGTTGTCGACTTGGTTTACAAGCCTAAAGAAACAGCATTGTTACGATTTGTTAGACAA 
CGAATACCCAATGCCCTGACACCTCGAACAGCTTCTGTTAATTTACCCTCTTCTACTTC 

AATGTCAGATAGGCATAATTCATGT T TTTTT CT TG AAAAGAGGT ATTCCAC ATTAACGG 
AATGTCAGATAGGCATAATTCATGTTTTTTTCTTGAAAAGAGGTATTCCACATTAACGG 

AATGTCAGATAGGCATAATTCATGTTTTTTTCTTGAAAAGAGGTATTCCACATTAACGG 

ATGGAGTGAAACATGCT T ATAATGGTCT AGGGAT GCTGAT T TATCAAGGAGCAGA 

AATGTCAGATAGGCATAATTCATGT TT TT T TCT TGAAAAG AGGTATTCCACATT AACGG 

GATAGAGAGTGGCGTGCAGG- 
GATAGAGAGTGGCGTGCAGGA 

GATAGAGAGTGGCGTGCA 

GATAG 
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Table 20: Comparative Sequences relating to SAG1723 (signal peptidase T) 



SEQ ID NO. 2001: SA61723 FROM THE DK1 GBS TYPE la STRAIN 

ATCGATTCGATATTGTAGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTC 
ATCAAATATAAAAATGACACCTTAACTATTAACAATAAAAA^^ 

TAAATTACAGGAAAAATATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCA 

GCGAATTTACTACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGT^ 

TTCAAAA 

SEQ XD NO. 2002: SA61680 FROM THE CJB110 GBS NONTYPEABLE STRAIN REVERSE 
COMPLEMENT 

TAAAGTTGACGGACACTCCATGGATCCAACTTTAGCTGACAAGGAAGAGCT 
TTGTAGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATT^ 

AATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATACTAAATTATTTAAA^ 
AAAATATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTO 

CTGTCGTGCCTAAAGGCCACTATTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGTCCCTTCAAAAAATCA 
ACAATTGTGGGAG 

SEQ ID NO. 2003: SAG1680 FROM THE 18RS21 GBS TYPE IX STRAIN 

TTGACGGACACTCCATGGATCCAACTTTAGCTGACAAGGAACAGCTAGTAGTTC 

GTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAAT 

CACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATA^ 

ATTCGTATAACCmCTTTTCCAAGACCTAGC^CAAAGCrc^ 

GTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAAT^ 

TGTGGGAGAGGT 

SEQ XD NO. 2004: SAG1680 FROM THE 2603 V/R GBS TYPE V STRAIN REVERSE COMPLEMENT 

AAGTTGACGGACACTCCATGGATCCAACTTTAGCTGACAAGGAACAGCTAGTAG 

GTAGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTAT^ 

TGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTA^ 

AATATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAGCGAATTTACTACT 
GTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGT 

SEQ ID NO. 2005: SAG1680 FROM THE M732 GBS TYPE III STRAIN REVERSE COMPLEMENT 

TTGACGGACACTCCATGGATCCAACTTTAGCTGACAAGGAAC^GCTAGTAGTTCTCAAACAAACAAAATAATCGA 

GGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCT^AATATAAAAATGAC^ 
CCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATACTAAATTATTTAAAAAGGAT 
TCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTTTCAC^ 
GCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGA 

SEQ ID NO. 2006: SAG1680 FROM THE M781 GBS TYPE XII STRAIN 

TTGACGGACACTCCATGGATCCAACTTTAGCTGACAAGGAACAGCTAGT 

GTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATATAAAAATGA 

CACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATACT^ 

TATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGC 

SEQ ID NO. 2007: SAG1680 FROM THE 1169NT1 GBS TYPE V STRAIN REVERSE COMPLEMENT 

TTGGTAAAGTTGACGGACACTCCATGGATCCAACTTTAGCTGACAAGG^ 

GATATTGTAGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCA 

TAAAAATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTA 

AGGAAAAATATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTCT 

ACX2ACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGCCCCTTCAAAAA 
ATCAACG , 

SEQ ID NO. 2008: SAG1680 FROM THE H36b GBS TYPE lb STRAIN REVERSE COMPLEMENT 
TTGACGGACACTCCATGGATCCAACTTTAGCTGACAAGGAAC^GCTAGTAGTT 
GTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATC 
CACCTTAACH71TTAACAATAAAAAAACAGAAGAACCTTACCTGAAGGAATATACT 

ATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAG 
GTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGA 

SEQ ID NO. 2009: SAG1680 FROM THE 090 GBS TYPE la STRAIN REVERSE COMPLEMENT 
TAAAGTTGACGGACACTCCATGGATCCAACn^TAGCTGACAAGGAACAG 

TTGTAGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATATAAA 
AATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATACTAAATTATTTAAAAAGG^^ 
AAAATATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTTTCACTACTGACAGCAATGGCAGCA 
- CTGTCGTGCCTAAAGGCCACTATTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGT 
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Table 20: Comparative Sequences relating to SAG1723 (signal peptidas I) 



SEQ ID NO. 2010: SAG1680 FROM THE A909 GBS TYPE la STRAIN REVERSE COMPLEMENT 
AAAGTTGACGGACACTCCATGGATCCAACTOTAGCTGACAAC^AAG 

TGTAGTGGCTAACGAAGAAGAAGGCGGCOUUVAGAAAAA^TTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCAT 
ATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATACT 

AAATATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGC^ 

TGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGTCCCTTCAAAAAATCAA 
CG 



SEQ2001 

SEQ2002 T AAAGT TGACGGACACTCCATGGATCCAACT T T AGCTGACAAGGAACAGCTAGTAG 

SEQ2003 TTGACGGACACTCCATGGATCCAACTTTAGCTGACAAGGAACAGCTAGTAG 

SEQ2004 AAGTTGACGGACACTCCATGGATCCAACTTTAGCTGACAAGGAACAGCTAGTAG 

SEQ2005 T TGACGGACACTCCATGGATCCAACT TT AGCTGACAAGGAACAGCTAGTAG 

SEQ2 006 TTGACGGACACTCCATGGATCCAACTTTAGCTGACAAGGAACAGCTAGTAG 

SEQ2007 TGGTAAAGT TGACGGACACTCCATGGATCCAACTT TAGCTG ACAAGGAACAGCTAGT AG 

SEQ2008 TTGACGGAC ACTCCAT GGATCCAACT TTAGCTG ACAAGGAACAGCTAGT AG 

SEQ2009 TAAAGTTGACGGACACTCCATGGATCCAACTTTAGCTGACAAGGAACAGCTAGTAG 

SEQ2010 AAAGTTG ACGGAC ACTCCATGGATCCAACT T TAGCTGACAAGGAACAGCT AGTAG 

SEQ2001 ATCGATTCGATATTGTAGTGGCTAACGAAGAAGAAGGCG 

SEQ2002 TCTCAAACAAACAAAAATCAATCGATTCGATATTGTAGTGGCTAACGAAGAAGAAGGCG 

SEQ2003 TCTCAAACAAACAAAAATCAATCGATTCGATATTGTAGTGGCTAACGAAGAAGAAGGCG 

SEQ2004 TCTCAAACAAACAAAAATCAATCGATTCGATATTGTAGTGGCTAACGAAGAAGAAGGCG 

SEQ2005 TCTCAAACAAACAAAA — TAATCGATTCGATATTGTAGTGGCTAACGAAGAAGAAGGCG 

SEQ2006 TCTCAAACAAACAAAAATCAATCGATTCGATATTGTAGTGGCTAACGAAGAAGAAGGCG 

SEQ2007 TCTCAAACAAACAAAAATCAATCGATTCGAT ATTGTAGTGGCTAACGAAGAAGAAGGCG 

SEQ2008 TCTCAAACAAACAAAAATCAATCGATTCGATATTGTAGTGGCTAACGAAGAAGAAGGCG 

SEQ2009 TCTCAAACAAACAAAAATCAATCGATTCGATATTGTAGTGGCTAACGAAGAAGAAGGCG 

SEQ2010 TCTCAAACAAACAAAAATCAATCGATTCGATATTGTAGTGGCTAACGAAGAAGAAGGCG 

SEQ2001 GCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATATA 

SEQ2002 GCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATATA 

SEQ2003 GCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATATA 

SEQ2004 GCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATATA 

SEQ2 005 GCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGT ATGCCAGGTGATGTCATCAAATAT A 

SEQ200 6 GCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATATA 

SEQ2007 GCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATATA 

SEQ2008 GCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATATA 

SEQ2009 GCCAAAAGAT^AAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATATA 

SEQ2010 GCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATATA 

SEQ2001 AAAATGACACCTTAACTATTAACAATAAAAT^AACAGAAGAACCTTACCTCAAGGAATATA 

SEQ2 002 AAAATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATA 

SEQ2003 AAAATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATA 

SEQ2004 AAAATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATA 

SEQ2 005 AAAATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATA 

SEQ2006 AAAATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATA 

SEQ2007 AAAATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATA 

SEQ2008 AAAATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATA 

SEQ200 9 AAAATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATA 

SEQ2010 AAAATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATA 

SEQ2001 CTAAATTATTT— AAAAAGGATAAATTACAGGAAA7VATATTCGTATAACCCACTTTTCCAA 

SEQ2002 CTAAATTATTT-AAAAAGGATAAATTACAGGAT^AAATATTCGTATAACCCACTTTTCCAA 

SEQ2 003 CTAAATTATTT-AAAAAGGATAAATT ACAGGAAAAATATTCGTATAACCCACTTTTCCAA 

SEQ2004 CTAAATTATTT-AAAAAGGATAAATTACAGGAAAAATATTCGTATAACCCACTTTTCCAA 

SEQ2005 CT AAATTATTT-AAAAAGGATAAATTACAGGAAAAATATTCGTATAACCCACTTTTCCAA 

SEQ2006 CTTIAATTATTTTAAAAAGGATAAATTACAGGAAAAATATTCGTATAACCCACTTTTCCAA 

SEQ2007 CTAAATTATTT-AAAAAGGATAAATTACAGGAAAAATATTCGTATAACCCACTTTTCCAA 

SEQ2 008 CTAAATTATTT-AAAAAGGATAAATTACAGG AAAAATATTCGTATAACCCACTTTTCCAA 

SEQ2009 CTAAATTATTT-AAAAAGGATAAATTACAGGAAAAATATTCGTATAACCCACTTTTCCAA 

SEQ2010 CTAAATTATTT-AAAAAGGATAAATTACAGGAA/^AATATTCGTATAACCCACTTTTCCAA 
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Table 20: C reparative Sequences relating to SAG1723 (signal peptidase I) 



SEQ2001 
SEQ2002 
SEQ2003 
SEQ2004 
SEQ2005 
SEQ2006 
SEQ2007 
SEQ2008 
SEQ2009 
SEQ2010 

SEQ2001 
SEQ2002 
SEQ2003 
SEQ2004 
SEQ2005 
SEQ2006 
SEQ2007 
SEQ2008 
SEQ2009 
SEQ2010 

SEQ2001 
SEQ2002 
SEQ2003 
SEQ2004 
SEQ2005 
SEQ2006 
SEQ2007 
SEQ2008 
SEQ2009 
SEQ2010 

SEQ2001 
SEQ2002 
SEQ2003 
SEQ2004 
SEQ2005 
SEQ2006 
SEQ2007 
SEQ2008 
SEQ2009 
SEQ2010 



GACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAGCGAATTTACT 
GACCTAGCACAAAGCTCTACCGCTTTCACTACTGACAGCAATGGCAGCAGCGAATTTACT 
GACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAGCGAATTTACT 
GACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAGCGAATTTACT 
GACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAGCGAATTTACT 
GACCTAGCACAAAGCTCTACCGCT TTCACCACTGACAGCAATGGCAGCAGCGAAT TT ACT 
GACCTAGCACAAAGCTCTACCGCTTTCACTACTGACAGCAATGGCAGCAGCGAATTTACC 
GACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAGCGAATTTACT 
GACCTAGCACAAAGCTCTACCGCTTTCACTACTGACAGCAATGGCAGCAGCGAATTTACT 
GACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAGCGAATTTACT 

CTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGAT 
CTGTCGTGCCTAAAGGCCACTATTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGAT 
CTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGAT 
CTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGAT 
CTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGA 



CTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGAT 

CTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGA 

CTGTCGTGCCTAAAGGCCACTATTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGAT 
CTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGAT 

GTCGTGCCGTCGGTTCCTTCAAAA 

GTCGTGCCGTCGGTCCCTTCAAAAAATCAACAATTGTGGGAG 

GTCGTGCCGTCGGTCCCTTCAAAAAATCAACGATTGTGGGAGAGGT 

GTCGTGCCGTCGGT 



GTCGTGCCGTCGGCCCCTTCAAAAAATCAACG 



GTCGTGCCGTCGGT 

GTCGTGCCGTCGGTCCCTTCAAAAAATCAACGTABCMARATVSTNCSRATNGTSAGSGN 



TDAS 



K21: Comparative Sequences relating to SAG0079 (adenylate kinase) 
NO. 2X01: SAG0079 FROM THE 2603V/R GBS TYPE V STRAIN 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTA^GGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTCA 

ACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGi^ATTGGTT 

CCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATATCCA 

CGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGAT 

CC^TCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAGTA 

GATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGA 

GAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGCAC^ 

GAAAAAGCGTTG 

SEQ ZD NO. 2102: SAG0079 FROM THE 090 GBS TYPE la STRAIN REVERSE COMPLEMENT 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAA 

ACAGGGGATATGTTCCGCGCCGCAATGGCTAATC^AACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTT 

CCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATATCCA 

CGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATA 

CCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAGTA 

GATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGA 

GAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGCAGATGTT 

GAAAAAGCGTTGCTAGAACTCAAA 

SEQ ID NO. 2X03: SAGO 07 9 FROM THE 1169NT1 GBS TYPE V STRAIN REVERSE COMPLEMENT 
TGGTAAAGGGACTC^GCAGCTAAGATTGTTGAAGAATTTGGTGTTGCGC^CATCTCAACAGGGGATATGTTCCGCGCCGCAATGGC 
TAATCaVAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTO 
AGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGGTATC^ 

TGCTACGCTTG7^AGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCATCATGTCTTATAGAGCGTTTGAGTGG 
TCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCC^CCAGTAGATT 
TGAAGATGATAAGCCTGAAACTGTC^VAACXSTCGCTTGGACGTTCATATTGCTCAAGGAGAACCT 
TGGCCTTGTTACAGATATTGAAGGTAATCAAGAAATAA 



SEQ XD NO. 2X04: SAG0079 FROM THE X8RS2X GBS TYPE XI STRAIN REVERSE COMPLEMENT 

AATCTTTTAACCACGGGTTCGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGT^AGAATTTGGTGTTGCTCACATCTCA 
ACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTT 
CCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATATCCA 
CGTACTATTGAACAAGCAC^CGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGAT 
CCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCX3TAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAGTA 
GATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGA 
GAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCA^ 
» GAAAAAGCGTTGCTAGAA 

SEQ XD NO. 2X05: SAGOO 7 9 FROM THE 2603V/R GBS TYPE V STRAIN REVERSE COMPLEMENT 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCT^ 

ACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTT 
CCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATATCCA 
CGTACTATTGAACAAGC^C^CGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCT^ 
CCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACT 

GATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCT 

GAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGCAGATGT^ 

GAAAAAGCGTTG 

SEQ XD NO. 2X06: SAG0079 FROM THE A909 GBS TYPE la STRAIN REVERSE COMPLEMENT 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTCA 

ACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTT 

CCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGA 

CGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGAT 

CCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCC^CAAAGTGTTCAACCCAC^ 

GATTATAAAGAAGAAGATTACTATC^CGTGAAGATGATAAGC^ 

GAATCTATTCTTGAACACTATCGAAAGCTTGGTCTTGTTACAGATATTGAAGGTAA 

SEQ ID NO. 2X07: SAGO 07 9 FROM THE CJBXXO GBS NONTYPEABLE STRAIN REVERSE COMPLEMENT 
AATCTTTTAACCACGGGTTTGCTTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTT 
ACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAA 
CCTGATGAAGTAAOU\ACGGGATTGTAAAAGAGCGCTTAGCTGAGG 

CGTACTATTGAAC^UVGCACACGCCTTAGATGCTAOTCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGA^ 
CCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAGTA 
GATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGG^ 
GAACCTATTCTTGAACACTATAG 
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X21: Comparative Sequences relating to SAG0079 (adenylate kinase) 
NO. 2108: SAG0079 FROM THE COHl GBS TYPE IZZ STRAIN REVERSE COMPLEMENT 
ATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTTGAAGAATTTGGT^ 

C^GGGGATATGTTCCGCGCCGCMTGGCTAATOVAACCCAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTC 

CTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGC^GAAAAAGGTTTTTTACTTGATGGATATCCAC 

GTACTATTGAGCAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATC 

CAACATGCCTTATAGAGCGTTTGAGTGGCCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAGTAG 

ATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCT 

AACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGCAGATGTTG 

AAAAAGCGTT G CT AG 

SEQ ID NO. 2109: SAGO 079 FROM THE H36t> GBS TRYP lb STRAIN REVERSE COMPLEMENT 

CAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAAT 
CTGATGAAGTAACAAACGGGATTGTAAAAGAGCG<nTAGCTGAGGATGATATCGCAGAAAAAGGTCT 

GTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTG7VAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATC 
CATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATC7U\TCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAGTAG 
ATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAG 
AATCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTAC^GATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGCAGATGTTG 
AAAAAGCGTTGCfr 

SEQ ID NO. 2110: SAGOO 7 9 FROM THE JM9130013 GBS TYPE VIII STRAIN REVERSE COMPLEMENT 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGT7VAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTCA 

ACAGGGGATATGTTCCGCGCCGCAATGGCTAATC^U\ACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTT 

CCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATATCCA 

CGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGAT 

CC^TCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAGTA 

GATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTTAAACGTCGCTTGGACGTTAATATTGCTCAAGGA 

GAACCTATTCTTGAACACTATAAAAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCA 



SEQ ID NO. 2111: SAG0079 FROM THE M732 GBS TYPE III STRAIN REVERSE COMPLEMENT 

CTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGA 

GGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCCAAATGGGACGTTTAGCTAA^GTTATATTGATAAAGGTGAATTGGTTCCT 
GATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATATCCACGT 
ACTATTGAGC^GCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCT 

ACATGCCTTATAGAGCGTTTGAGTGGCCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAGTAGAT 
TATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAA 
CCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAAC^GAAGTTTTTGC^ 
AAAGCGTTGCTAGAACTCAAA 



SEQ ID NO. 2112: SAG0079 FROM THE M781 GBS TYPE III STRAIN REVERSE COMPLEMENT 

AATCTTTTAATTACGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTTGAAGAATTTGGTGTTGCTCACATCTC^ 
ACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCCAAATGGGACGTTTAGCra 

CCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATATCC^ 
CGTACTATTGAGCAAGGACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCT 
CCAACATGCCTTATAGAGCGTTTGAGTGGCCGTATTATCAATCGTAAAACTGGTGAAACT 
GATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCT 



SEQ2 101 ATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTT 

SEQ2102 ATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTT 

SEQ21 03 TGGTAAAGGGACTCAAGCAGCTAAGATTGTT 

SEQ2104 ATCTTTTAACCACGGGTTCGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTT 

SBQ21 05 ATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTT 

SEQ2106 ATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTT 

SEQ2107 ATCTTTTAACCACGGGTTTGCTTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTT 

SEQ2 108 ATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTT 

SEQ2109 

SEQ2110 ATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTT 

SBQ2111 — CTTTTAATTATGGGTTTGCCTGGTGCTGGTAT^AGGTACTCAAGCAGCTAAGATTGTT 

SEQ2112 ATCTTTTAATTACGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTT 

SEQ2101 AAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTAAT 

SEQ2102 AAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTAAT 

SEQ2103 AAGAATTTGGTGTTGCGCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTAAT 

SEQ2104 AAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTAAT 

SEQ2105 AAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTAAT 

SEQ2106 AAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTAAT 

SEQ2107 AAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTAAT 

SEQ2108 AAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTAAT 

SEQ2109 : CAGGGGATATGTTCCGCGCCGCAATGGCTAAT 

SEQ2110 AAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTAAT 



2 



21: Comparative Sequences relating to SAG0079 (adenylate kinase) 



Ll 

SEQ2112 

SEQ2101 
SEQ2102 
SEQ2103 
SEQ2104 
SEQ2105 
SEQ2106 
SEQ2107 
SEQ2108 
SEQ2109 
SEQ2110 
SEQ2111 
SEQ2112 

SEQ2101 
SEQ2102 
SEQ2103 
SEQ2104 
SEQ2JL05 
SEQ2106 
SEQ2107 
SEQ2108 
SEQ2109 
SEQ2110 
SEQ2111 
SEQ2112 

SEQ2101 
SEQ2102 
SEQ2103 
SEQ2104 
SEQ2105 
SEQ2106 
SEQ2107 
SEQ2108 
SEQ2109 
SEQ2110 
SEQ2111 
SEQ2112 

SEQ2101 
SEQ2102 
SEQ2103 
SEQ2104 
SEQ2105 
SEQ2106 
SEQ2107 
SEQ2108 
SEQ2109 
SEQ21X0 
SEQ21X1 
SEQ2112 

SEQ210X 
SEQ2102 
SEQ2103 
SEQ2104 
SEQ2105 
SEQ2106 
SEQ2107 
SEQ2108 
SEQ2109 
SEQ2110 
SEQ2111 
SEQ2112 



AAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTAAT 
AAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTAAT 

CAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGAT 
CAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGAT 
CAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGAT 
CAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGAT 
CAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGAT 
CAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGAT 
CAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGAT 
CAAACCCAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGAT 
CAAACCGAJVATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGAT 
CAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGAT 
CAAACCCAAATGGGACGTTTAGCTA7\AAGTTATATTGATAAAGGTGAATTGGTTCCTGAT 
CAAACCCAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGAT 

AAGTAACAMCGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGT 
'AAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGT 
AAGTAACAAACGGGATTGTAATVAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGT 
AAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGT 
AAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGT 
. AAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGT 
AAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGT 
AAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGT^AAAAGGT 
AAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGT 
AAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGT 
AAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGT 
AAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGT 

TTTTTACTTGATGGATATCCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTT 
TTTTTACTTGATGGATATCCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTT 
TTTTTACTTGATGGGTATCCACGTACTATTGAAC^GCACACGCCTTAGATGCTACGCTT 
TTTTTACTTGATGGATATCCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTT 
TTTTTACTTGATGGATATCCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTT 
TTTTTACTTGATGGATATCCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTT 
TTTTTACTTGATGGATATCCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTT 
TTTTTACTTGATGGATATCCACGTACTATTGAGCAAGCACACGCCTTAGATGCTACGCTT 
TTTTTACTTGATGGATATCCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTT 
TTTTTACTTGATGGATATCCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTT 
TTTTTACTTGATGGATATCCACGTACTATTGAGCAAGCACACGCCTTAGATGCTACGCTT 
TTTTTACTTGATGGATATCCACGTACTATTGAGCAAGCACACGCCTTAGATGCTACGCTT 

GAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCATCATGTCTT 
GAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCATCATGTCTT 
GAAGAACTAGGACTACGCTTAGATGGTGTTA'TTAATATTAAAGTGGATCCATCATGTCTT 
GAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCATCATGTCTT 
GAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCATCATGTCTT 
GAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCATCATGTCTT 
GAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCATCATGTCTT 
GAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCAACATGCCTT 
GAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCATCATGTCTT 
GAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCATCATGTCTT 
GAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCAACATGCCTT 
GAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCAACATGCCTT 

ATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTG 
ATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTG 
ATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTG 
ATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTG 
ATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTG 
ATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTG 
ATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTG 
ATAGAGCGTTTGAGTGGCCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTG 
ATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTG 
ATAGAGCGTTTGAGTGCTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTG 
ATAGAGCGTTTGAGTGGCCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTG 
ATAGAGCGTTTGAGTGGCCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTG 



3 



T^fc21: 

SE^^sOl 

SEQ2102 
SEQ2103 
SEQ2104 
SEQ2105 
SEQ2106 
SEQ2107 
SEQ2108 
SEQ2109 
SEQ2110 
SEQ2111 
SEQ2112 



SEQ2101 
SEQ2102 
SEQ2103 
SEQ2104 
SEQ2105 
SEQ2106 
SEQ2107 
SEQ2108 
SEQ2109 
SEQ2110 
SEQ2111 
SEQ2112 

SEQ2101 
SEQ2102 
SEQ2103 
SEQ2104 
SEQ2105 
SEQ2106 
SEQ2107 
SEQ2108 
SEQ2109 
SEQ2110 
SEQ2111 
SEQ2112 

SEQ2101 
SEQ2102 
SEQ2103 
SEQ2104 
SEQ2105 
SEQ2106 
SEQ2107 
SEQ2108 
SEQ2109 
SEQ2110 
SEQ2111 
SEQ21X2 



Comparative Sequences relating to SAG0079 (adenylate kinase) 

TTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCT 
TTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCT 
TTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCT 
TTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCT 
TTCAACCCACCAGTAGATTATAAAG7VAGAAGATTACTATCAACGTGAAGATGATAAGCCT 
TTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATMGCCT 
TTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCT 
TTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTG7VAGATGATAAGCCT 
TTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCT 
TTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCT 
TTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCT 
TTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCT 

GAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAACCTATTCTTGAACAC 
G7VAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAACCTATTCTTGAACAC 
GAAACTGTCAAACGTCGCTTGGACGTTCATATTGCTCAAGGAGAACCTATTCTTGAACAC 
GAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAACCTATTCTTGAACAC 
GAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAACCTATTCTTGAACAC 
GAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAATCTATTCTTGAACAC 
GAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAACCTATTCTTGAACAC 
GAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAACCTATTCTTGAACAC 
GAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAATCTATTCTTGAACAC 
GAAACTGTTAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAACCTATTCTTGAACAC 
GAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAACCTATTCTTGAACAC 
GAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAATABCMARATVSTNCSR AT 

ATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTT 
ATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTT 

ATAGTAAGCTTGGCCTTGTTACAGATATTGAAGGTAATCAAGAAATAA 

ATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTT 
ATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTT 

ATCGAAAGCTTGGTCTTGTTACAGATATTGAAGGTAA 

ATAG t 

ATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTT 
ATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTT 

ATAAAAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCA 

ATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTT 
GT S AGADN YATKNAS 

CAGATGTTGAAAAAGCGTTG 

CAGATGTTGAAAAAGCGTTGCTAGAACTCAAA 



CAGATGTTGAAAAAGCGTTGCTAGAA- 
CAGATGTTGAAAAAGCGTTG 



CAGATGTTGAAAAAG CGTTGCT AG- 
CAGATGTTGAAAAAGCGTTGCT 



CAGATGTTGAAAAAGCGTTGCTAGAACTCAAA 
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21: C mparative Sequences relating t SAG0079 (adenylate kinase) 

>SEQ ID NO 2150:090 frame: X 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRI INRKTGETFHKVFN PPVDYKEEDYYQREDDKPETVKRRLDVN IAQGEPILEH 
YRKLGLVT DIEGNQEITEVFADVEKALLELK 

>SEQ XD NO 2151:114_1169NT frame: 2 

GKGTQAAKI VEE FGVAH I STGDMFRAAMANQTEMGRLAKSYI DKGELVPDQVTNGI VKER 
LAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCLIERLSGRIIN 
RKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVHIAQGE P I LEHYSKLGLVT DI 
EGNQEI 

>SEQ ZD MO 2152: 114_18RS21 frame: 1 

NLLTTGSPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRI INRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVN IAQGEPILEH 
YRKLGLVTDIEGNQEITEVFADVEKALLE 

>SEQ ID NO 2153: 114 2603 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRI INRKTGETFHKVFN PPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEP I LEH 
YRKLGLVT DIEGNQEITEVFADVEKAL 

>SEQ ID NO 2154: 114_A909 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQTUiALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRI INRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGE SI LEH 
YRKLGLVTDIEG 

>SEQ ID NO 2155:114_A909 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRI INRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGES I LEH 
YRKLGLVTDIEG 

>SEQ ID NO 2156: 114jCJB110 frame: 1 

NLLTTGLLGAGKGTQAAKI VEEFGVAHI STGDMFRAAMANQTEMGRLAKS YI DKGELVPD 
EVTOGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRI INRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVN IAQGEPILEH 
Y 

>SEQ ID NO 2157: 114_COHl frame: 3 

LLIMGLPGAGKGTQAAKI VEEFGVAHI STGDMFRAAMANQTQMGRLAKSYI DKGELVPDE 
VTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCLI 
ERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEHY 
RKLGLVTDIEGNQEITEVFADVEKALL 

>SEQ ID NO 2158: 114_H36B frame: 3 

GDM FRAAMANQTEMGRLAKS YI DKGELVPDEVTNGI VKERLAE DDIAEKGFLLDGY PRTI 
EQAHALDATLEELGLRLDGVINIKVDPSCLIERLSGRIINRKTGETFHKVFNPPVDYKEE 
DYYQREDDKPETVKRRLDVNIAQGESILEHYRKLGLVTDIEGNQEITEVFADVEKAL 

>SEQ XD NO 2159: 114_JM9130013 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYI DKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRI INRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPI LEH 
YKKLGLVTDIEGN 

>SEQ XD NO 2160:114_M732 frame: 1 

LLIMGLPGAGKGTQAAKI VEE FGVAHI STGDMFRAAMANQTQMGRLAKSYI DKGELVPDE 
VTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCLI 
ERLSGRI INRKTGET FHKVFN PPVDYKEEDYYQREDDKPETVKRRLDVN I AQGE PI LEH Y 
RKLGLVT DIEGNQEITEVFADVEKALLELK 

>SEQ XD NO 2161: 114_M781 frame: 1 

NLLITGLPGAGKGTQAAKI VEEFGVAHI STG DMFRAAMANQTQMGRLAKS YI DKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQiUiALDATLEELGLRLDGVINIKVDPTCL 
IERLSGRI INRKTGET FHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQ 
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Mi 21: Comparative Sequence^ relating t SAG0079 (adenylate kinase) 

50 LLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 

SEQ2151 GKGTQAAKI VEE FGVAHI STG DMFRAAMANQTEMGRLAKS YI DKGELV P D 

SEQ2152 LLTTGSPGAGKGTQAAKIVEEFGVAHI STG DMFRAAMANQTEMGRLAKS Y IDKGELVPD 

SEQ2153 LLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 

SEQ2154 LLIMGLPGAGKGTQAAKIVEE FGVAHI STGDMFRAAMANQTEMGRLAKS YI DKGELVPD 

SEQ2155 LLIMGLPGAGKGTQAAKIVEE FGVAHI STGDMFRAAMANQTEMGRLAKS Y I DKGELVPD 

SEQ2156 LLTTGLLGAGKGTQAAKIVEE FGVAHI STGDMFRAAMANQTEMGRLAKS YI DKGELVPD 

SEQ2157 LLIMGLPGAGKGTQAAKI VEE FGVAHI STGDMFRAAMANQTQMGRLAKS YI DKGELVPD 

SEQ2158 GDMFRAAMANQTEMGRLAKSY I DKGELVPD 

SEQ2159 LLIMGLPGAGKGTQAAKIVEE FGVAHI STGDMFRAAMANQTEMGRLAKS YI DKGELVP D 

SEQ2160 LLIMGLPGAGKGTQAAKIVEE FGVAHISTGDMFRAAMANQTQMGRLAKSYIDKGELVPD 

SEQ2161 LLITGLPGAGKGTQAAKI VEEFGVAHI STGDMFRAAMANQTQMGRLAKS Y I DKGELVPD 

SEQ2150 EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGIiRLDGVINIBCVDPSCL 

SEQ2151 QVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

SEQ2152 EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

SEQ2153 EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

SEQ2154 EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

SEQ2155 EVTNGI VKERLAE DDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVXN IKVDPSCL 

SEQ2156 EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

SEQ2157 EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCL 

SEQ2158 EVTNGIVKERLAE DDI AEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVIN IKVDPSCL 

SEQ2159 EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

SEQ2160 EVTNGIVKERLAE DDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVIN I KVDPTCL 

SEQ2161 EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCL 

SEQ2150 IERLSGRI INRKTGET FHKVFN P PVDYKEEDYYQREDDKPETVKRRLDVN I AQGEPI LEH 

SEQ2151 IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVHIAQGEPILEH 

SEQ2152 IERLSGRI INRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 

SEQ2153 IERLSGRI INRKTGET FHKVFNP PVDYKEEDYYQREDDKPETVKRRLDVN I AQGEPI LEH 

SEQ2154 IERLSGRI INRKTGET FHKV FN PPVDYKEEDYYQREDDKPETVKRRLDVNXAQGESILEH 

SEQ2155 IERLSGRI INRKTGET FHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGES I LEH 

SEQ2156 I ERLSGRI INRKTGET FHKVFN P PVDYKEEDYYQREDDKPETVKRRLDVN IAQGE PI LEH 

SEQ2157 IERLSGRI INRKTGET FHKVFN PPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 

SEQ2158 IERLSGRI INRKTGET FHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVN I AQGES I LEH 

SEQ2159 IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 

SEQ2160 IERLSGRI INRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 

SEQ2161 IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQ 

SEQ2150 RKLGLVTDIEGNQEITEVFADVEKALLELK 

SEQ215X SKLGLVTDIEGNQEI 

SEQ2152 RKLGLVTDIEGNQEITEVFADVEKALLE — 

SEQ2153 RKLGLVTDI EGNQE ITE VFADVEKAL 

SEQ2154 RKLGLVTDIEG 

SEQ2155 RKLGLVTDIEG 

SEQ2156 

SEQ2157 RKLGLVT D I EGNQE ITE VFADVEKALL 

SEQ2158 RKLGLVTDIEGNQEITEVFADVEKAL 

SEQ2159 KKLGLVT D IEGN 

SEQ2160 RKLGLVTDIEGNQEITEVFADVEKALLELK 

SEQ2161 
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Table 22: Comparative Sequences relating to SAG0093 * 
(D-alanyl-D-alanin carboxypeptidase family protein) 




SEQ ZD NO. 2201: SAG0093 FROM THE 090 GBS TYPE la STRAIN REVERSE COMPLEMENT 
AAGCCTAACAGTCAACAATCATCATCTCAAAAGTTGAGGAATGAGGATATAAAAAAGATATCCTCTC^^ 

CAATTACCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTT 
CCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGAGAA 
CATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTT^ 

TTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGATGGAT 
ATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTA 
CGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGCAAAATAT 
ATGGCCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAA 

SEQ ID NO. 2202: SAGOO 93 FROM THE 1169NT1 GBS TYPE V STRAIN REVERSE COMPLEMENT 

AAGCCTAACAGTCAAGAATCATCACCTCAAAAGTTGAGGAATGAGGATATAAAAAAGATA 

CGATTACCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTG 
CCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGAGAA 
CATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAAT 
TTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGATGGAT 
ATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTA 
CGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC^AAAl'AT 
ATGGCCGAACATCGTTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAA 

SEQ ID NO. 2203: SAG0093 FROM THE 18RS21 GBS TYPE II STRAIN 

AAGCCTAACAGTCAACAATCATCATCTCAAAAGTTGAGGAATGAGGATATAAAAAAGATATC 

CAATTACCAGCTGTATCATC^AAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAA 

CCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGAGAA 
CATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAAT 
TTGACGAGGGGACAAGC^GAAAAGTTGGTAAAAACTTACTCTCAGC^ 

ATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTA 
CXSGTTTCCGGATGGTAAAACAGC^GAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGCAAAATAT 
ATGGCCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAA 

SEQ ID NO. 2204: SAG0093 FROM THE 2603V/R GBS TYPE V STRAIN 

ACAGTCAACAATCATCATCTCAAAAGTTGAGGAATGAGGATATAAAAAAGATATCCT 

CAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTTCCTGTTG 
AAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTC7VGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGAGAACATTTAA 
TTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAATTTGACGA 
GGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGATGGATATGAGTA 
CTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTACGGTTTC 
CGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC^AAATATATGGCCA 
AAGATCATTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAAAAC^ 

SEQ ID NO. 2205: SAG0093 FROM THE A909 GBS TYPE la STRAIN 

AAGCCTAACAGTCAACAATCATCATCTCAAAAGTTGAGGAATGAGGATATAAAAAAGACAT 

CGATTACCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTG 

CCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCy^GCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGAG7^A 

CATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAAATGACTAGTAACCCTAAT 

TTGACGAAGGAACAAGCAGAAAAGTTGGTAAAAACTTACTCT 

ATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTC^CT^ 

CGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGCAAAATAT 
ATGGCCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAA 

SEQ ID NO. 2206: SAGOO 93 FROM THE CJB1X0 GBS NONTYPEABLE STRAIN * 

AAGCCTAACAGTCAACAATCATCATCTCAAAAGTTGAGGAATGAGGA 

ACAATTACCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGT 
TCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGC^ 

ACATTTAATTTCGGGTTATCGTAGTGTTGCCTAT(^GGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCC^ 

TTTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGATGGA 

TATGAGTACTGTAGATTCTTTG7\ATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTT 

ACGGTTTCCGGATGGTAAAACAG»GAAACAGGGGTAGGTTATGAAGATO^ 

TATGGCCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAA 
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Table ^^Comparative Sequences relating to SAGTO93 
(D-alanyl-D-alanine carboxypeptidase family protein) 

SEQ XD HO. 2207: SA60093 FROM THE COH1 GBS TYPE III STRAIN 

CCTAACAGTCAACAATCATCATCTCAAAAGTTGAGGAATGAGGATATAAAAAAGACA^ 

ATTACCAGCTGTATCATCttAAAGATTGGAACTTGATTTTGGTCAATCGTC 

TGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGAGAACA 

TTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTA 

GACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGA^ 

GAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACW^TATGGTTTTGTCTTACG 
GTTTCCGGATGGTAAAACAGC^GAAACTVGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGCAAAATATAT 
GGTCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAAAACCCAGCTTTCTTGTACA 



SEQ ID NO. 2208: SAG0093 FROM THE H36b GBS TYPE lb STRAIN 

AAGCCTAACAGTCAACAATCATCATCTCAAAAGTTGAGGAATGAGGATATAAAAAAGACATCCTCTCAAA 
CGATTACCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACA 

CCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGAGAA 

CATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAwGAAATGACTAGTA^ 

TTGACGAAGGAACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGATGGAT 

ATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTA 

CGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGCAAAATAT 

AT GGCCAAACATCAT TTAACATTAGAAG AATAC AT AACTTT ATTAAAGGAGAATAACCAA 

SEQ ID NO. 2209: SA60093 FROM THE JM9130013 GBS TYPE VIII STRAIN 

AAGCCTAACAGTCAACAATCAT CATCTCAAAAGTTGAGGAATGAGGAT ATAAAAAAGAT ATCCTCT C AAAAAAG AAATAAG AAATTA 
CAATTACCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTT 
CCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTA 

CATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAAT 

TTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGATGGAT 

ATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGT(^GTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTA 

CGGTTTCCGGATGGTAAAACAGC&GAAACAGGGGTAGGTTATGAAGATTGGCA^ 

ATGGCCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAA 



SEQ ID NO. 2210: SA60093 FROM THE M732 GBS TYPE III STRAIN 

AGCCTAACAGTCAACAATC^TCATCTCAAAAGTTGAGGAATGAGGATATAAA 
GATTACCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATC 

CTGTTGAAAATATTTATTTGGATAAAGGTATTACGAAGCAAGCTACTC^GTTTTTAGAGGCTGCTAGAGCAATTGATTCACGAG 

ATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAATT 

TGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCA^^ 

TGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTAC 

GGTTTCCGGATGGTAAAACAGCAGAAA(^GGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTC 

TGGTCAAACATCATTTAAC^TTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAAAACCCAGCTTTCTT 

SEQ ID NO. 2211: SA60093 FROM THE M7B1 GBS TYPE III STRAIN 

AAGCCTAAGAGTCAACAATCTVrCATCTCAAAAGTTGAGGAATGAGGATATAAA 

CGATTACCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTG 

CCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGA 

CATTTAATTTCGGGTTATCGTAGTGTTGCCTATC^GGAGAAGTTGTTCAATTCTTATGTTACT 

TTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGATGGAT 
ATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCT 

CGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGCAAAATAT 
ATGGTCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAA 



SEQ2201 
SEQ2202 
SEQ2203 
SEQ2204 
SEQ2205 
SEQ2206 
SEQ2207 
SEQ2208 
SEQ2209 
SEQ2210 
SEQ2211 



AGCCTAACAGTCAACAATCATCATCTCAAAAGTTGAGGAATGAGGATATAAAAAAGATA 
AGCCTAACAGTCAACAATCATCACCTCAAAAGTTGAGGAATGAGGATATAAAAAAGATA 
AGCCTAACAGTCAACAATCATC^TCTCAAAAGTTGAGGAATGAGGATATAAAAAAGATA 

ACAGTCAACAATCATCATCTCAAAAGTTGAGGAATGAGGATATAAAAAAGATA 

AGCCTAACAGTCAACAATCATCATCTCAAAAGTTGAGGAATGAGGATATAAAAAAGACA 
AGCCTAACAGTCAACAATCATCATCTCAAAAGTTGAGGAATGAGGATATAAAAAAGATA 
— CCTAACAGTCAACAATCATCATCTCAAAAGTTGAGGAATGAGGATATAAAAAAGACA 

agcctaac^gtcaacaatcatcatctc^uwvgttgaggaatgaggatataaaaaagaca 
agcctaacagtcaacaatcatcatctcaaaagttgaggaatgaggatataaaaaagata 
agcctaacagtcaacaatcatcatctcaaaagttgaggaatgaggatataaaaaagaca 
agcctaacagtcaacaatcatcatctcaaaagttgaggaatgaggatataaaaaagaca 



2 



Table 22: Comparative Sequences relating to SAG0093 
(D-alanyl-D-alanine carboxypeptidase family protein) 



SBQ2201 
SEQ2202 
SEQ2203 
SEQ2204 
SEQ2205 
SEQ2206 
SEQ2207 
SEQ2208 
SEQ2209 
SEQ2210 
SEQ2211 

SEQ2201 
SEQ2202 
SEQ2203 
SEQ2204 
SEQ2205 
SEQ2206 
SEQ2207 
SEQ2208 
SEQ2209 
SEQ2210 
SEQ22X1 

SEQ2201 
SE02202 
SEQ2203 
SEQ2204 
SEQ2205 
SEQ2206 
SEQ2207 
SEQ2208 
SEQ2209 
SBQ2210 
SEQ2211 



TCCTCTCAAAAAAGAAAT-AAGAAATT-ACAATTACCAGCTGTATCATCAAAAGATTGGA 
TCCTCTCAAAAAAGAAAT-AAGAAATT-ACGATTACCAGCTGTATCATCAAAAGATTGGA 
TCCTCTCAAAAAAGAAAT-AAGAAATT-ACAATTACCAGCTGTATCATCAAAAGATTGGA 
TCCTCTCAAAAAAGAAAT-AAGAAATT-ACAATTACCAGCTGTATCATCAAAAGATTGGA 
TCCTCTCAAAAAAGAAAT-AAGAAATT-ACGATTACCAGCTGTATCATCAAAAGATTGGA 
TCCTCTCJUUVAAAGAAAT-AAGAAATTTACAATTACCAGCTGTATCATCAAAAGATTGGA 
TCCTCTCAAAAAAGAAATTAAGAAATT-ACGATTACCAGCTGTATCATCAAAAGATTGGA 
TCCTCTCAAAAAAGAAAT-AAGAAATT-ACGATTACCAGCTGTATCATCAAAAGATTGGA 
TCCTCTCAAAAAAGAAAT-AAGAAATT-ACAATTACCAGCTGTATCATCAAAAGATTGGA 
TCCTCTCAAAAAAGAAAT-AAGAAATT-ACGATTACCAGCTGTATCATCAAAAGATTGGA 
TCCTCTCAAAAAAGAAAT-AAGAAATT-ACGATTACCAGCTGTATCATCAAAAGATTGGA 

ACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTTCCTG 
ACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTGCCTG 
ACTTGATTTTGGTC7VATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTTCCTG 
ACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTTCCTG 
ACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTGCCTG 
ACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTTCCTG 
ACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTGCCTG 
ACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTGCCTG 
ACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTTCCTG 
ACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTGCCTG 
ACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTGCCTG 

TTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTG 
TTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTG 
TTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTG 
TTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTG 
TTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTG 
TTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTG 
TTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTG 
TTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTG 
TTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTG 
TTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTG 
TTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTG 



SEQ2201 
SEQ2202 
SEQ2203 
SEQ2204 
SEQ2205 
SEQ2206 
SEQ2207 
SEQ2208 
SEQ2209 
SEQ221Q 
SEQ2211 

SEQ2201 
SEQ2202 
SEQ2203 
SEQ2204 
SEQ2205 
SEQ2206 
SEQ2207 
SEQ2208 
SEQ2209 
SEQ2210 
SEQ2211 



CTAGAGCAATTGATTCACGAGAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGG 
CTAGAGCAATTGATTCACGAGAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGG 
CTAGAGCAATTGATTCACX3AGAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGG 
CTAGAGCAATTGATTCACGAGAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGG 
CTAGAGCAATTGATTCACX3AGAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGG 
CTAGAGCAATTGATTCACGAGAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGG 
CTAGAGCAATTGATTCACGAGJ^ACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGG 
CTAGAGCAATTGATTCACGAGAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGG 
CTAGAGCAATTGATTCACX^GAACTVTTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGG 
CTAGAGCAATTGATTCACGAGAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGG 
CTAGAGCAATTGATTCACGAGAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGG 

AGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAATTTGACGAGGG 
AGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAATTTGACGAGGG 
AGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAATTTGACGAGGG 
AGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAATTTGACGAGGG 
AGAAGTTGTTCAATTCTTATGTTACTCAAGAAATGACTAGTAACCCTAATTTGACGAAGG 
AGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAATTTGACGAGGG 
AGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAATTTGACGAGGG 
AGAAGTTGTTCAATTCTTATGTTACTCAWGAAATGACTAGTAACCCTAATTTGACGAAGG 
AGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAATTTGACGAGGG 
AGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAATTTGACGAGGG 
AGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAATTTGACGAGGG 
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Table 22: Comparative Sequences relating to SAG0093 
(D-alanyl-D-alanine carb xypeptidase family protein) 



SEQ2201 
SEQ2202 
SEQ2203 
SEQ2204 
SEQ2205 
SEQ2206 
SEQ2207 
SEQ2208 
SEQ2209 
SEQ2210 
SEQ2211 

SEQ2201 
SEQ2202 
SEQ2203 
SEQ2204 
3EQ2205 
SEQ2206 
SEQ2207 
SEQ2208 
SEQ2209 
SEQ2210 
SEQ2211 

SEQ2201 
SEQ2202 
SEQ2203 
SEQ2204 
SEQ2205 
SEQ2206 
SEQ2207 
SEQ2208 
SEQ2209 
SEQ2210 
SEQ2211 

SEQ2201 
SEQ2202 
SEQ2203 
SEQ2204 
SEQ2205 
SEQ2206 
SEQ2207 
SEQ2208 
SEQ2209 
SEQ2210 
SEQ2211 

SEQ2201 
SEQ2202 
SEQ2203 
SEQ2204 
SEQ2205 
SEQ2206 
SEQ2207 
SEQ2208 
SEQ2209 
SEQ2210 
SEQ2211 



ACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGA 
ACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGA 
ACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGA 
ACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGA 
AC^VAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGA 
ACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGA 
ACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGA 
ACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGA 
ACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGA 
ACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGA 
ACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGA 

CTGGATTAGCGATGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAG 
CTGGATTAGCGATGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAG 
CTGGATTAGCGATGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAG 
CTGGATTAGCGATGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAG 
CTGGATTAGCGATGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAG 
CTGGATTAGCGATGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAG 
CTGGATTAGCGATGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAG 
CTGGATTAGCGATGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAG 
CTGGATTAGCGATGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAG 
CTGGATTAGCGATGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAG 
CTGGATTAGCGATGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAG 

TCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTACGGTTTCCGGATGGTA 
TCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTACGGTTTCCGGATGGTA 
TCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTACGGTTTCCGGATGGTA 
TCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTACGGTTTCCGGATGGTA 
TCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTACGGTTTCCGGATGGTA 
TCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTACGGTTTCCGGATGGTA 
TCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTACGGTTTCCGGATGGTA 
TCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTACGGTTTCCGGATGGTA 
TCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTACGGTTTCCGGATGGTA 
TCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTACGGTTTCCGGATGGTA 
TCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTACGGTTTCCGGATGGTA 

AAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGT 
AAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGT 
AAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGT 
AAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGT 
AAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGT 
AAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGT 
AAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGT 
AAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGT 
AAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGT 
AAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGT 
AAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGT 

CTGCAAAATATATGGCCA7VACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGG 
CTGCAAAATATATGGCCGAACATCGTTTAACATTAGAAGAATACATAACTTTATTAAAGG 
CTGCAAAATATATGGCCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGG 
CTGCAAAATATATGGCCAAACATC^TTTAACATTAGAAGAATACATAACTTTATTAAAGG 
CTGCAAAATATATGGCCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGG 
CTGCAAAATATATGGCCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGG 
CTGCAAAATATATGGTCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGG 
CTGCAAAATATATGGCCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGG 
CTGCAAAATATATGGCCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGG 
CTGCAAAATATATGGTCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGG 
CTGCAAAATATATGGTCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGG 
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SEQ2201 
SEQ2202 
SEQ2203 
SEQ2204 
SEQ2205 
SEQ2206 
SEQ2207 
SEQ2208 
SEQ2209 
SEQ2210 
SEQ2211 



e^^< 



Table 2zT Comparative Sequences relating to SAGU093 
(D-alanyl-D-alanine carb xypeptidase family protein) 




AGAATAACCAA 

AGAATAACCAA 

AGAATAACCAA 

AGAATAACCAAAACCCAGCTTTCTTGTACAA 

AGAATAACCAA 

AGAATAACCAA 

AGAATAACCAAAACCCAGCTTTCTTGTACAA 

AGAATAACCAA * 

AGAATAACCAA 

AGAATAACCAAAACCCAGCTTTCTT 

AGAATAACCAATABCMARATVSTNCSRATNGTSAGDAANYDAANNCARBXYTDASAMYRT 



>SEQ ID NO 2250: 18_090 frame: 1 

KPN S QQS S S QKLRNE DIKK I S SQKRNKKLQL PAVS SKDWNLI LVNRDHKHEELS PDWPV 
EN I YLDKRITKQATQFLE AARAI D SREHLI SGYRS VAYQEKLFNS YVTQEMT SN PNLTRG 
QAEKLVKTYSQPAGASEKQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
TAETGVGYEDWHYRYVGVESAKYMAKHHLTLEEYITLLKENNQ 

>SEQ ID NO 2251: 18_1169NT frame: 1 

KPNSQQSS PQKLRNEDIKKISSQKRNKKLRLPAVS SKDWNLI LVNRDHKHEELS PDWPV 
ENI YLDKRITKQATQFLEAARAI DSREHLI SGYRSVAYQEKLFNS YVTQEMT SN PNLTRG 
QAEKLVKT YSQPAGASEHQTGLAMDMSTVDSLNE S D PRWSQLKKI APQYGFVLRFPDGK 
TAETGVGYEDWHYRYVGVESAKYMAEHRLTLEEYITLLKENNQ 

>SEQ ID NO 2252: 18 18RS21 frame: 1 

KPNSQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILVNRDHKHEELSPDWPV 
EN I YLDKRITKQATQFLEAARAI DSREHLI SGYRSVAYQEKLFNS YVTQEMT SNPNLTRG 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRVVSQLKKIAPQYGFVLRFPDGK 
TAETGVGYEDWHYRYVGVESAKYMAKHHLTLEEYITLLKENNQ 

>SEQ ID NO 2253: 18_2603 frame: 3 

SQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILVNRDHKHEELSPDVVPVENI 
YLDKRITKQATQFLEAARAI DSREHLI SGYRSVAYQEKLFN SYVTQEMTSN PNLTRGQAE 
KLVKTYSQPAGASEHQTGLAMDMS TVDSLNE SDPRWS QLKKI APQYGFVLRFPDGKTAE 
TGVGYEDWHYRYVGVESAKYMAKHHLTLEEYITLLKENNQNPAFLY 

>SEQ ID NO 2254: 18_A909 frame: 1 

KPNSQQSSSQKLRNEDIKKTSSQKRNKKLRLPAVSSKDWNLILVNRDHKHEELSPDWPV 
ENIYLDKRITKQATQFLEAARAI DSREHLI SGYRSVAYQEKLFN SYVTQEMTSNPNLTKE 
QAEKLVJCTYSQPAGASEHQTGLAMDMSTVDSLNESDPRVVSQLKKIAPQYGFVLRFPDGK 
TAETGVG YE DWHYRYVGVESAKYMAKHHLT LEE Y ITLLKENNQ 

>SEQ ID NO 2255:18_CJB110 frame: 1 
KPNSQQSSSQKLRNEDIKKISSQKRNKKFTITSCIIKRLELDFGQS 

>SEQ ID NO 2256:18jCOHl frame: 1 
PNSQQSSSQKLRNEDIKKTSSQKRN 

>SEQ ID NO 2257: 18_H36B frame: 1 

KPNSQQSSSQKLRNEDIKKTSSQKRNKKLRLPAVSSKDWNLILVNRDHKHEELSPDWPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTXEMTSNPNLTKE 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRVVSQLKKIAPQYGFVLRFPDGK 
TAETGVGYEDWHYRYVGVESAKYMAKHHLTLEEYITLLKENNQ 

>SEQ ID NO 2258: 18_JM9130013 frame: 1 

KPNSQQSSSQKLRNEDIKKIS SQKRNKKLQL PAVS SKDWNLILVNRDHKHEELS PDWPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRG 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
TAETGVGYEDWHYRYVGVESAKYMAKHHLTLEEYITLLKENNQ 

>SEQ ID NO 2259:18_M732 frame: 3 

PNSQQSSSQKLRNEDIKKTSSQKRNKKLRLPAVSSKDWNLILVNRDHKHEELSPDWPVE 
NIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRGQ 
AEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGKT 
AETGVGYE DWHYRYVGVE S AKYMVKHHLTLEE YITLLKENNQN PAF 
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Comparative Sequences relating to SAGOT93 
(D-alany 1-D-alanine carboxypeptidase family protein) 



>SEQ ID NO 2260: 18_M781 frame: 1 

KPNSQQSSSQKLRNEDIKKTSSQKRNKKIJILPAVSSKDWNLILVNRDHKHEELSPDVVPV 
ENIYLDKRITKQATQFLEAARAIDgREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRG 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRVVSQLKKIAPQYGFVLRFPDGK 
TAETGVGYEDWHYRYVGVESAKYMVKHHLTLEEYITLLKENNQ 



SEQ2250 
SEQ2251 
SEQ2252 
SEQ2253 
SEQ2254 
SEQ2255 
SEQ2256 
SEQ2257 
SEQ2258 
SEQ2259 
SEQ2260 



PNSQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILVNRDHKHEELSPDVVPV 
PNSQQSSPQKLRNEDIKKISSQKRNKKLRLPAVSSKDWNLILVNRDHKHEELSPDWPV 
PNSQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILVNRDHKHEELSPDVVPV 
— SQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILWRDHKHEELSPDWPV 
PNSQQSSSQKLRNEDIKKTSSQKRNKKLRLPAVSSKDWNLILVNRDHKHEELSPDVVPV 

PNSQQSSSQKLRNEDIKKISSQKRNKKFTITSCIIKRLEL DFGQS 

PNSQQSSSQKLRNEDIKKTSSQKRN 

PNSQQS S SQKLRNE DI KKTS SQKRNKKLRL P AVS SKDWNLI LVNRDHKHEELS PDWPV 
PNSQQSSSQKLRNEDIKKISSQKBINKKLQLPAVSSKDWNLILVNRDHKHEELSPDVVPV 
PNSQQSSSQKLRNEDIKKTSSQKRNKKUU^PAVSSKDWNLILVNRDHKHEEI^PDVVPV 
PNSQQSSSQKLRNEDIKKTSSQKRNKKLRLPAVSSKDWNLILVNRDHKHEELSPDVVPV 



SEQ2250 
SEQ2251 
SEQ2252 
SEQ2253 
SEQ2254 
SEQ2255 
SEQ2256 
SEQ2257 
SEQ2258 
SEQ2259 
SBQ2260 



NI YLDKRITKQATQFLEAARAI DS REHLI SGYRSVAYQEKLFNS YVTQEMTSNPNLTRG 
NI YLDKRITKQATQFLEAARAI DSREHLI SGYRSVAYQEKL EH S YVTQEMTSNPNLTRG 
NIYLDKRITKQATQFLEAARAI DSREHLI SGYRSVAYQEKLFNS YVTQEMTSNPNLTRG 
NIYLDKRITKQATQFLEAARAI DSREHLI SGYRSVAYQEKLFNS YVTQEMTSNPNLTRG 
NIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTKE 



NIYLDKRITKQATQFLEAARAI DSREHLI S GYRSVAYQEKLFNS YVTXEMTSN PNLTKE 
NIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRG 
NIYLDKRITKQATQFLEAARAI DSREHLI SGYRSVAYQEKLFNS YVTQEMTSNPNLTRG 
NIYLDKRITKQATQFLEAARAIDSREHLI SGYRSVAYQEKLFNS YVTQEMTSNPNLTRG 



SEQ2250 
SEQ2251 
SEQ2252 
SEQ2253 
SEQ2254 
SEQ2255 
SEQ2256 
SEQ2257 
SEQ2258 
SEQ2259 
SEQ2260 



AEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
AEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRVVSQLKKIAPQYGFVLRFPDGK 
AEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
AEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
AEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 



AEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
AEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
AEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
AEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRVVSQLKKIAPQYGFVLRFPDGK 



SEQ2250 
SEQ2251 
SEQ2252 
SEQ2253 
SEQ2254 
SEQ2255 
SEQ2256 
SEQ2257 
SEQ2258 
SEQ2259 
SEQ2260 



AETGVGYEDWHYRYVGVESAKYMAKHHLTLEEYITLLKENNQ 

AETGVGYEDWHYRYVGVESAKYMAEHRLTLEEYITLLKENNQ 

AETGVG YEDWHYRYVGVE SAKYMAKHHLTLEE YITLLKENNQ 

AETGVGYEDWHYRYVGVESAKYMAKHHLTLEEYITLLKENNQNPAFLY- 
AET GVG YEDWHYRYVGVE S AKYMAKHHLT LEEY I T LLKENNQ 



AETGVGYEDWHYRYVGVESAKYMAKHHLTLEEYITLLKENNQ 

AETGVG YEDWHYRYVGVE SAKYMAKHHLTLEE YITLLKENNQ 

AETGVG YEDWHYRYVGVESAKYMVKHHLTLEEYI TLLKENNQN PAF 

AETGVGYEDWHYRYVGVESAKYMVKHHLTIjEEYITLLKENNQTABLECMPARATIVESE 



SEQ2250 
SEQ2251 
SEQ2252 
SEQ2253 
SEQ2254 
SEQ2255 
SEQ2256 
SEQ2257 
SEQ2258 
SEQ2259 
SEQ2260 



ENCESRELATINGTSAGDALANYLDALANINECARBXYPEPTIDASEFAMILYPRTEIN 
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Comparative Sequences relating to SAGOTro3 



Table 23T Comparative Sequences relating to SAG 
(competence pr teinCglA) 



SEQ ID NO. 2301: SAGO 163 FROM THE 090 GBS TYPE III STRAIN (REVERSE COMPIiEMENT) 

GGCAGTAGAAGTAAATGCTCAAGATATTTATATCATTCCCAAAGGTGATTGTTATGAACTCTATATGCGTATTGATGATGAAAGGCG 

GTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAA 

ACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCGTGG 

TCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATC^GGACTTAAAATATTGGTTTGATAATATAAAGCAAA 

ACTGGGTACAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATT 

TAAAAATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAACTCCAATTGAATGAGGATATTGG 

AATGACTTATGATGCTTTAATCAAACTGTCTTTACGGCATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGC 

CCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTTCCGGAGTCTATGATAGGCT 

TATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAA 

TGACTTTGAGACAGGTAACTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAAGGACATATCAG 

TAAGAAACAGGCACAAGTCGAAAAAATTATCCCTCAAGAAACAACGGAAAGTAGTCCAACTTTT 

SEQ ID NO. 2302: SAG0163 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

GGTGATTGTTATGAAACCTCTACTATTGCGTATTTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTC 

TTATTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAAAG 

AGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTC 

AT»GGACTTAAAATATTGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTACAAGAGGGCTATATCTTTTTTCCGGCCCTG 

TGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTC^GAAGTATTTAAAAATAAGOVAATTATCACGATTGA 

AAATCAAGAATGACAAGATGTTACAACTCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTA 

ATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCTCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGG 

TTTTTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATA 

GTCTAAAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAAGTAACTTTAAAAAACACTCA 

ACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAAGGATATATCAGTAAGAAACAGGCACAAGTCGAAAAAATTATCCCTCAAG 

AAACAACGGAAAGTAGTCCAACTTTT 

SEQ ID NO. 2303: SAG0163 FROM THE 18RS21 GBS TYPE II STRAIN (REVERSE COMPLEMENT ) 
GTTCAATCATTAGCAAAGCAAGTCATTCATCAGGCAGTAGAAGTAA 

GAACTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTAAA 
TTTGTGGCAGGCTVTGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTAT^ 

TTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATAT 
TGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACA 
ACTCTCATGTATCAATTAGCTTC^GAAGTATTTAAAAATAAGC^^ 
ATGTTACAACTCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACT 

ATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCAT 
GCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCA^ 

TATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGTAATTTTAAAAAACTVCTCATCAGAC^^GTGGAATAGACAA 
GTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGCACAAGTCGAAAAAATTATCCCTCAAGAAACAACGGAAAGTAGT 
CCAACTTTT 

SEQ ID NO. 2304: SAG0163 FROM THE 2603 V/R GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

GATATTTATATCATTCCCAAAGGTGATTGTTATGAACTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTT 
AATAGGATGGCTAGTCTTATTAGTC^CTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGT 
GACTATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGT 
ATTTTGTATTCAGGTCATC^GGACTTAAAATATTGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATAT 
CTOTTTTCCGGCCCTGTGGGGAGTGGTAAAACS^CTCTCATGTATCAATTAGCT^ 

ATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAACTCCAATTGAATGAGGATATTGGAATGAC 

AAACTGTCTTTACGGCATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGT 

TTAACGGGAGTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACT 

CAAGAGTTAGAAAATAGTCTAAAATTAATAGCATATCAACGTTTAA 

AAAAAAC^CTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAAGGACATATCAGTAA 
AAAAATTATCCCTCAAGAAACAACGGAAAGTAGTCCAACTTTT 
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Table 




Comparative Sequences relating t SAG 
(competence protein CglA) 




SEQ ID NO. 2305: SA60163 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 
GTTCAATCATTAGCAAAGCAAGTC^TTCATCAGGCAGTAGAAGTAAATGCTCAAGATATTTATATCATTCCCAAAGGTGATTGTTA^ 
GAACTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTAAA 
TTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGT^ 

TTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATAT 
TGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACA 
ACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAAATAAGCAA^ 

ATGTTACAACTCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATGAAACTGTC^ 

ATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCAT 

GCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAATTAATAGCA 

TATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGTAATTTTAAAAAACACT 

GTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGCACAAGTCGAAAAAATO 

CCAACTTTT 

SEQ ZD NO. 2306: SAG0163 FROM THE CJBllO GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

GTTCAATC^TTAGCAAAGCAAGTCATTCATC^GGCAGTAGAAGTAAATGCTCAAGATATTTATATCATTCCCAAAGGTGATTGTTAT 
GAACTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTAAA 
TTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGGGAAGACTGGTTT^ 
TTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATAT 
TGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTACAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACA 
ACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAAATAAGCAAATTATCACGAT^ 

ATGTTAW^CTCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGC^TCGTCCAGATATTTTA 

ATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCAT 

GCTAAAAGTATTTCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAATTAATAGCA 

TATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGTAACTTTAAAAA^ 

GTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAAC^GGCACAAGTCGAAAAAATTATCC 

CCAACTTTT 

SEQ ID NO. 2307: SAG0163 FROM THE COH1 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

AGGTGATTGTTATGAAATTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTA 
TTAGTCACTTTAAATTTGTGGCAGGGkTGAACGTTGGAGAAAAAAGACGA^ 

GAAGACTGGTTTCATTACGACTATCAAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTACTTTGTATTCAGGTCATC 
AGGACTTAAAATATTGGTTTGATAATATAAAGTAAATGAAGGAAGTACTGTGTGCAAGAGGGCTATATCTTTTTTCCGGCCCTGTGG 
GGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTC^GAAGTATTTAAAAATAAGCAAAT^ 

TCAAGAATGAC7VAGATGTTACAACTCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTCT 

GTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTAATGGTTT 

TTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATC^GAGTTAGAAAATAGTC 

TAAAATTAATAGC^TATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAAGTAACT 

AGTGGAATAGACAAGTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGC^^ 

CAACGGAAAGTAGTCCAACTTTT 

SEQ ID NO. 2308: SAG0163 FROM THE H36b GBS TYPE lb STRAIN (REVERSE COMPLEMENT) 

TCATTAGCAAAGCAAGTCATTCATCAGGCAGTAGAAGTAAATGCTCAAGATATTTATATCATTCCCAAAGGTGAT 

TATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTC^CTTTAAATTTGTG 

GCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAG 

CTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTGGTTT 

GATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTA 

ATGTATCAATTAGCTTCAGAAGTATTTAAAAATAAGCAAATTATCACGATTGA 

CAACTCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGC^ 

GGAGAGAAATAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGT^ 

AAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAATTA 

ACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGTAATTTTAAAAAACACTCATCAGACAAGTGGAATAGA 

TATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGCACAAGTCGAAAAAATTATC 

TTTT 
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Comparative Sequences relating to SA< 
(competence protein CglA) 




SEQ ID NO. 2309: SAG0163 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

GTTCAATCATTAGCAAAGCAAGTCATTCATCAGGCAGTAGAAGTAAATGCTCAAGATATTTATATCATTCCCAAAGGTGATTGTTAT 

GAACTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTAAA 

TTTGTGGCAGGCATGAACGTTGGAGAAAAAAGAOTAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGGGAAGACT 

TTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATAT 

TGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACA 

ACTCTCATGTATCAATTAGCTTmGAAGTATTTAAAAATAAGCAAATTATC^CGATTGAAGATCCGGTAGAAATCAAGAATGACAAG 

ATGTTACAACTCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGCATCGTCCAGATATTTTA 

ATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCAT 

GCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAATTAATAGCA 

TATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGTAATTTTAAAAAACACTCATCAGACAAGTGGAATAGACAA 

GTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGCAC^GTCGAAAAAATTATCCCTCAAGAAACAACGGAAAGTAGT 

CCAACTTTT 

SEQ ID NO. 2310: SAGO 163 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

TGACTTGTTATGAAACTCTATATGCGTATTTGATGATGAAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTA 
TTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGG 
GAAGACTGGTTTCATTACGACTATCAAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTACTTTGTATTCAGGTCATC 
AGGACTTA7VAATATTGGTTTGATAATATAAAGTAAATGAAGGAAGTACTGTGTGCAAGAGGGCTATATCTTTTTTCCGGCCCTGTGG 
GGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAAATAAG 

TCAAGAATGACAAGATGTTACAACTCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTT^ATCAAACTGTCTTTACGGCATC 
GTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTAATGGTTT 
TTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTC 
TAAAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAAGTAACTTTAAAAAACACTCATCAGACA 
AGTGGAATAGACAAGTGGATATCTTGGCTGAAGAAGGAC^TATCAGTAAGAAACAGGCACAAGTCGAAAAAATTATCCCTCAAGAAA 
CAACGGAAAGTAGTCCAACTTTT 

SEQ ID NO. 2311: SAG0163 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

CAGTAGAAGTAAATGCTCAAGATATTTATATCATTCCC^AAGGTGATTGTTATGAATTCTATATGCGTATTGATGATGAAAGGCGGT 
TTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTAAATTO 

GAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCAAGTGTGGGAGATTATCGTGGTC 
AAGAATCTTTAGTTATTCGTACTTTGTATTCAGGTCATCAGGACTTAAAATATTGGTTTGATAATATAAAGCAAATGAAGGAAGTAC 
TGTGTGQ\AGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACAACT 

AAAATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATC^GAATGACAAGATGTTACAACTCCAATTGAA 

TGACTTATGATGCTTTAATCAAACTGTCTTTACGGCATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCC 

GTGCTGTTATTCGTGCAAGTTTAACGGGAGTAATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTA 

TAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTG 

ACTTTGAGACAAGTAACTTTAAAAAAC^CTCATCAGACAAGTGGAAT^ 

AGAAACAGGCACAAGTCGAAAAAATTATCCCTCAAGAAACAACGGAAAGTAGTCCAACTTTT 
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C mparative Sequences relating t SAG0163 
(competence protein Cgl A) 

SEQ2301 GGCAGTAGAAGTAAATGCTCAAGATATT 

SEQ2302 

SEQ2303 TTCAATCATTAGCAAAGCAAGTCATTCATCAGGCAGTAGAAGTAAATGCTCAAGATATT 

SEQ2304 GATATT 

SEQ2305 TTCAATCATTAGCAAAGCAAGTCATTCATCAGGCAGTAGAAGTAAATGCTCAAGATATT 

SEQ2306 TTCAATCATTAGCAAAGCAAGTCATTCATCAGGCAGTAGAACTAAATGCTCAAGATATT 

SEQ2307 

SEQ2308 TCATTAGCAAAGCAAGTCATTCATCAGGCAGTAGAAGTAAATGCTCAAGATATT 

SEQ2309 TTCAATCATTAGCAAAGCAAGTCATTCATCAGGCAGTAGAAGTAAATGCTCAAGATATT 

SEQ2310 

SEQ2311 CAGTAGAAGTAAATGCTCAAGATATT 

SEQ2301 ATATCATTCCCAAAGGTGA-TTGTTATGAA-CTCTATA TGCGTATT-GATGATGA 

SEQ2302 GGTGA-TTGTTATGAA-ACCTCTACTATTGCGTATTTGATGATGA 

SEQ2303 ATATCATTCCCAAAGGTGA-TTGTTATGAA— CTCTATA TGCGTATT-GATGATGA 

SEQ2304 ATATCATTCCCAAAGGTGA-TTGTTATGAA-CTCTATA TGCGTATT-GATGATGA 

SEQ2305 ATATCATTCCCAAAGGTGA-TTGTTATGAA-CTCTATA TGCGTATT-GATGATGA 

SEQ2306 ATATCATTCCCAAAGGTGA-TTGTTATGAA-CTCTATA TGCGTATT-GATGATGA 

SEQ2307 AGGTGA— TTGTTATGAAATTCTATA TGCGTATT-GATGATGA 

SEQ2308 ATATCATTCCCAAAGGTGA-TTGTTATGAA-CTCTATA TGCGTATT-GATGATGA 

SEQ2309 ATATCATTCCCAAAGGTGA-TTGTTATGAA-CTCTATA TGCGTATT-GATGATGA 

SEQ2310 TGACTTGTTATGAAACTCTATA TGCGTATTTGATGATGA 

SEQ2311 ATATCATTCCCAAAGGTGA-TTGTTATGAA-TTCTATA TGCGTATT-GATGATGA 

SEQ2301 AA-GGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 

SEQ2 302 AA-GGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 

SEQ2303 AA-GGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 

SEQ2304 AA-GGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 

SEQ2305 AA-GGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 

SEQ2306 AA-GGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 

SEQ2307 AA-GGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 

SEQ2308 AA-GGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 

SEQ2 30 9 AA-GGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTT A 

SEQ2310 AAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 

SEQ2311 AA-GGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 

SEQ2301 AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACT 

SEQ2302 AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACT 

SEQ2303 AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACT 

SEQ2304 AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACT 

SEQ2305 AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACT 

SEQ2306 AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACT 

SEQ2307 AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACT 

SEQ2308 AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACT 

SEQ2309 AATTTGTGGC^GGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACT 

SEQ2310 AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACT 

SEQ2311 AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACT 
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Tabl Comparative Sequences relating t SAG« 
(competence protein CglA) 



kGW63 



SEQ2301 
SEQ2302 
SEQ2303 
SEQ2304 
SEQ2305 
SEQ2306 
SEQ2307 
SEQ2308 
SEQ2309 
SEQ2310 
SEQ231X 



ATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCGTG 
ATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCGTG 
ATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCGTG 
ATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCGTG 
ATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCGTG 
ATGMCTGTCAGAGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCGTG 
ATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCAAGTGTGGGAGATTATCGTG 
ATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCGTG 
ATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCGTG 
ATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCAAGTGTGGGAGATTATCGTG 
ATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCAAGTGTGGGAGATTATCGTG 



SEQ2301 
SEQ2302 
SEQ2303 
SEQ2304 
SEQ2305 
SEQ2306 
SEQ2307 
SEQ2308 
SEQ2309 
SEQ2310 
SEQ2311 



GTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTGGT 
GTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTGGT 
GTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTGGT 
GTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTGGT 
GTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTGGT 
GTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTGGT 
GTCAAGAATCTTTAGTTATTCGTACTTTGTATTCAGGTCATCAGGACTTAAAATATTGGT 
GTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTGGT 
GTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTGGT 
GTCAAGAATCTTTAGTTATTCGTACTTTGTATTCAGGTCATCAGGACTTAAAATATTGGT 
GTCAAGAATCTTTAGTTATTCGTACTTTGTATTCAGGTCATCAGGACTTAAAATATTGGT 



SEQ2301 
SEQ2302 
SEQ2303 
SEQ2304 
SEQ2305 
SEQ2306 
SEQ2307 
SBQ230S 
SEQ2309 
SEQ2310 
SEQ2311 



TTGATAATATAAAGCAAATGAAGGAAGTACTGGGTACAAGAGGGCTATATCTTTTTTCCG 
TTGATAATATAAAGCAAATGAAGGAAGTACTGGGTACAAGAGGGCTATATCTTTTTTCCG 
TTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCG 
TTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCX5 
TTGATAATATAAAGC7VAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCG 
TTGATAATATAAAGCAAATGAAGGAAGTACTGGGTACAAGAGGGCTATATCTTTTTTCCG 
TTGATAATATAAAGTAAATGAAGGAAGTACTGTGTGCAAGAGGGCTATATCTTTTTTCCG 
TTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCG 
TTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCG 
TTGATAATATAAAGTAAATGAAGGAAGTACTGTGTGCAAGAGGGCTATATCTTTTTTCCG 
TTGATAATATAAAGCAAATGAAGGAAGTACTGTGTGCT^AGAGGGCTATATCTTTTTTCCG 



SEQ2301 
SEQ2302 
SEQ2303 
SEQ2304 
SEQ2305 
SEQ2306 
SEQ2307 
SEQ2308 
SEQ2309 
SEQ2310 
SEQ231X 



GCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAA 
GCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAA 
GCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAA 
GCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAA 
GCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAA 
GCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAA 
GCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAA 
GCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAA 
GCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAA 
GCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAA 
GCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAA 
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Table 



Comparative Sequences relating to SAG0163 
(competence protein Cgl A) 



SEQ2301 
SEQ2302 
SEQ2303 
SEQ2304 
SEQ2305 
SEQ2306 
SEQ2307 
SEQ2308 
SEQ2309 
SEQ2310 
SEQ2311 

SEQ2301 
SEQ2302 
SEQ2303 
SEQ2304 
SEQ2305 
SEQ2306 
SEQ2307 
SEQ2308 
SEQ2309 
SEQ2310 
SEQ2311 

SEQ2301 
SEQ2302 
SEQ2303 
SEQ2304 
SEQ2305 
SEQ2306 
SEQ2307 
SEQ2308 
SEQ2309 
SEQ2310 
SEQ2311 

SEQ2301 
SEQ2302 
SEQ2303 
SEQ2304 
SEQ2305 
SEQ2306 
SEQ2307 
SEQ2308 
SEQ2309 
SEQ2310 
SEQ2311 

SEQ230X 
SEQ2302 
SEQ2303 
SEQ2304 
SEQ2305 
SEQ2306 
SEQ2307 
SEQ2308 
SEQ2309 
SEQ2310 
SEQ2311 

SEQ2301 
SEQ2302 
SEQ2303 
SEQ2304 
SEQ2305 
SEQ2306 
SEQ2307 



ATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAAC 
ATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAAC 
ATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAAC 
ATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAAC 
ATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAAC 
ATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAAC 
ATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAAC 
ATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAAC 
ATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAAC 
ATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAAC 
ATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAAC 

TCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGC 
TCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGC 
TCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGC 
TCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGC 
TCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGC 
TCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGC 
TCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGC 
TCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGC 
TCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGC 
TCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGC 
TCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGC 

ATCCTCCAGATATTTTAATTATCGGAGAGAT-TAGAGATCAAGCGACGGCCCGTGCTGTT 
ATCGTCCAGATATTTTAATTATCGGAGAGAT-TAGAGATCAAGCGACGGCTCGTGCTGTT 
ATCGTCCAGATATTTTAATTATCGGAGAGAT-TAGAGATCAAGCGACGGCCCGTGCTGTT 
ATCGTCCAGATATTTTAATTATCGGAGAGAT-TAGAGATCAAGCGACGGCCCGTGCTGTT 
ATCGTCCAGATATTTTAATTATCGGAGAGAT-TAGAGATCAAGCGACGGCCCGTGCTGTT 
ATCGTCCAGATATTTTAATTATCGGAGAGAT-TAGAGATCAAGCGACGGCCCGTGCTGTT 
ATCGTCCAGATATTTTAATTATCGGAGAGAT-TAGAGATCAAGCGACGGCCCGTGCTGTT 
ATCGTCCAGATATTTTAATTATCGGAGAGAAATAGAGATCAAGCGACGGCCCGTGCTGTT 
ATCGTCCAGATATTTTAATTATCGGAGAGAT-TAGAGATCAAGCGACGGCCCGTGCTGTT 
ATCGTCCAGATATTTTAATTATCGGAGAGAT-TAGAGATCAAGCGACGGCCCGTGCTGTT 
ATCGTCCAGATATTTTAATTATCGGAGAGAT-TAGAGATCAAGCGACGGCCCGTGCTGTT 

ATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTTCC 
ATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCC 
ATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCATGCTAA7VAGTATTCCC 
ATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCC 
ATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCC 
ATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTTCC 
ATTCGTGCAAGTTTAACGGGAGTAATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCC 
ATTCGTGCAAGTTTAACGGGAGTGATGTTTTTTTCTACTATTCATGCTAAAAGTATTCCC 
ATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCC 
ATTCGTGCAAGTTTAACGGGAGTAATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCC 
ATTCGTGCAAGTTTAACGGGAGTAATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCC 

GGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTA 
GGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTA 
GGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTA 
GGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTA 
GGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTA 
GGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTA 
GGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTA 
GGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTA 
GGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTA 
GGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTA 
GGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTA 

AAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGT 
AAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAAGT 
AAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGT 
AAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGT 
AAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGT 
AAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGT 
AAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCT7VATTGACTTTGAGACAAGT 
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2^ 



SEQ2308 
SEQ2309 
SEQ2310 
SEQ2311 

SEQ2301 
SEQ2302 
SEQ2303 
SEQ2304 
SEQ2305 
SEQ2306 
SEQ2307 
SEQ2308 
SEQ2309 
SEQ2310 
SEQ2311 

SEQ2301 
SEQ2302 
SEQ2303 
SEQ2304 
SEQ2305 
SEQ2306 
SEQ2307 
SEQ230B 
SEQ2309 
SEQ2310 
SEQ2311 

SEQ2301 
SEQ2302 
SEQ2303 
SEQ2304 
SEQ2305 
SEQ2306 
SEQ2307 
SSQ2308 
SEQ2309 
SEQ2310 
SEQ2311 



Comparative Sequences relating to SAGO 
(competence protein Cgl A) 



AAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGT 
AAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGT 
AAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAAGT 
AAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAAGT 

AACTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAA 
AACTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAA 
AATTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAA 
AATTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAA 
AATTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAA 
AACTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAA 
AACTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAA 
AATTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAA 
AATTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAA 
AACTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAA 
AACTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAA 

GGACATATCAGTAAGAAACAGGCACAAGT-CGAAA7VAATTATCCCTCAAGAAACAACGGA 
GGATATATCAGTAAGAAACAGGCACAAGT-CGAAAAAATTATCCCTCAAGAAACAACGGA 
GGACATATCAGTAAGAAACAG GCAC AAGT - CGAAAAAATT AT CCCTCAAG AAACAACGGA 
GGACATATCAGTAAGAAACAGGCACAAGTGCGAAAAAATTATCCCTCAAGAAACAACGGA 
GGACATATCAGTAAGAAACAGGCACAAGT-CGAAAAAATTATCCCTCAAGAAACAACGGA 
GGACATATCAGTAAGAAACAGGCACAAGT-CGAAAAAATTATCCCTCAAGAAACAACGGA 
GGACATATCAGTAAGAAACAGGCACAAGT-CGAAAAAATTATCCCTCAAGAAACAACGGA 
GGACATATCAGTAAGAAACAGGCACAAGT-CGAAAAAATTATCCCTCAAGAAACAACGGA 
GGACATATCAGTAAGAAACAGGCACAAGT-CGAAAAAATTATCCCTCAAGAAACAACGGA 
GGACATATCAGTAAGAAACAGGCACAAGT-CGAAAAAATTATCCCTCAAGAAACAACGGA 
GGACATATCAGTAAGAAACAGGCACAAGT-CGAAAAAATTATCCCTCAAGAAACAACGGA 

AAGTAGTCCAACTTTT 

AAGTAGT CCAACTTTT 

AAGTAGTCCAACTTTT 

AAGTAGTCCAACTTTT— — 

AAGTAGTCCAACTTTT 

AAGTAGTCCAACTTTT 

AAGTAGTCCAACTTTT 

AAGTAGTCCAACTTTT 

AAGTAGTCCAACTTTT 

AAGTAGTCCAACTTTT 

AAGTAGTCCAACTTTT 



>SEQ ID NO 6350:63_090 frame: 2 

AVEVNAQDI YI I PKGDCYELYMRI DDERRFX DVFEFNRMAS LISHFKFVAGMNVGEKRRS 
QLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDNIKQMKEVLGTR 
GLYLFSGPVGSGKTTLMYQIJ^EVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDAL 
IKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSISGVYDRLIELGVNYQ 
ELENS LKL I AYQRL I GGG S LI D FETGN FKKHS S DKWNRQVDILAEEGH X SKKQAQVEKI I 
PQETTESSPTF 

>SEQ ID NO 6351:63_1169NT frame: 3 

• LI* . NLYYCVFDDERRFI DVFE FNRMASLI SHFKFVAGMNVGEKRRSQLGSCDYELSEGR 
LVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDNIKQMKEVLGTRGLYLFSGPVGSGK 
TTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDALIKLSLRHRPDILI 
IGEIRDQATARAVIRASLTGVMVFSTIHAKS I PGVYDRLIELGVNYQELENSLKLI AYQR 
LIGGGSLIDFETSNFKKHSSDKWNRQVDILAEEGYI SKKQAQVEKI I PQETTESSPTF 

>SEQ ID NO 6352:63_JL8RS21 frame: 1 

VQSLAKQVIHQAVEVNAQDIYI I PKGDCYELYMRI DDERRFI DVFE FNRMASLI SHFKFV 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
IKQMJCEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
DRLIELGVNYQELENSLKLIAYQRLIGGGSLI DFETGNFKKHSSDKWNRQVDILAEEGHI 
SKKQAQVEKI I PQETTES SPTF 

>SEQ ID NO 6353: 63J2603 frame: 1 

DIYI I PKGDCYELYMRI DDERRFI DVFE FN RMASLISHFKFVAGMNVGEKRRSQLGSCDY 
ELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDNIKQMKEVLGIRGLYLFSG 



Table 




Comparative Sequences relating to SAG0163 
(competence protein CglA) 



PVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDALIKLSLRH 
RPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVYDRLIELGVNYQELENSLK 
LIAYQRLI GGGSLI DFETGN FKKHS S DKWNRQVDI LAEEGHI SKKQAQVRKN Y PSRNNGK 
-SNF 

>SEQ ID NO 6354:63_A909 frame: X 

VQSLAKQVIHQAVEVNAQDI YI I PKGDCYELYMRI DDERRFI DVFE FNRMASLI SHFKFV 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
IKQMKEVLG I RGLYLFSGPVG SGKTTLMYQLASEVFKNKQI ITIEDPVE IKNDKMLQLQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
DRLIELGVNYQEIJ2NSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
SKKQAQVEKIIPQETTESSPTF 

>SEQ ZD NO 6355:63_CJB110 frame: 1 

VQSLAKQVIHQAVEVNAQDI YI I PKGDCYELYMRI DDERRFI DVFE FNRMASLI SHFKFV 
AGMNVGEKRRSQLGS CDYELSEGRLVSLRLS SVGDYRGQES LVI RI LYSGHQDLKYWFDN 
IKQMKEVLGTRGLYL FSGPVG SGKTTLMYQLASEVFKNKQI IT IEDPVE IKNDKMLQLQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSISGVY 
DRLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSS DKWNRQVDI LAEEGHI 
SKKQAQVEKIIPQETTESSPTF 

>SEQ ID NO 6356:63_CJB110 frame: 1 

VQSLAKQVIHQAVEVNAQDI YI I PKGDCYELYMRI DDERRFI DVFE FNRMASLI SHFKFV 

AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 

IKQMKEVLGTRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVE IKNDKMLQLQL 

NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSISGVY 

DRLIELGVNYQELENSLKLIAYQRLIGGGSLI DFETGN FKKHS SDKWNRQVDILAEEGHI 

SKKQAQVEKIIPQETTESSPTF 

>SEQ ID NO 6357: 63_H36B frame: 1 

SLAKQVIHQAVEVNAQDIYI I PKGDCYELYMRI DDERRFI DVFE FNRMASLI SHFKFVAG 
MNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDNIK 
QMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNE 
DIGMTYDALIKLSLRHRPDILI IGEK 

>SEQ ID NO 6358 : 63_JM9130 013 frame: 1 

VQSLAKQVIHQAVEVNAQDIYI I PKGDCYELYMRI DDERRFI DVFE FNRMASLI SHFKFV 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRI LYSGHQDLKYWFDN ' 
IKQMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQI ITIEDPVEIKNDKMLQLQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
DRLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
SKKQAQVEKIIPQETTESSPTF 

>SEQ 3D NO 6359:63_M732 frame: 3 

TCYETLYAYI^MKRRFIDVFEFNRMASLISHFKFVAGMNVGEKRRSQLGSCDYELSEGRL 
VSLRLSSVGDYRGQESLVIRTLYSGHQDLKYWFDNIK . MKEVLCARGLYLFSGPVGSGKT 
TLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDALIKLSLRHRPDILII 
GEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVYDRLIELGVNYQELENSLECLIAYQRL 
IGGGSLIDFETSNFKKHSSDKWNRQVDILAEEGHI SKKQAQVEKIIPQETTESSPTF 
>SEQ ID NO 6360;63_M781 frame: 3 

VEVNAQDIYIIPKGDCYEFYMRIDDERRFIDVFEFNRMASLISHFKFVAGMNVGEKRRSQ 
LGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRTLYSGHQDLKYWFDNIKQMKEVLCARG 
LYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDALI 
KLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVYDRLIELGVNYQE 
LENSLKLIAYQRLIGGGSLIDFETSNFKKHSSDKWNRQVDILAEEGHISKKQAQVEKIIP 
QETTESSPTF 

>SEQ ID NO 636X:63jCOHl frame: 3 

VIVMKFYMRI DDERRFI DVFE FNRMASLI SHFKFVAGMNVGEKRRSQLGSCDYELSEGRL 
VSLRLSSVGDYRGQESLVIRTLYSGHQDLKYWFDNIK 
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SEQ6351 
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SEQ6351 
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SEQ6355 
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SEQ6350 
SEQ6351 
SEQ6352 
SEQ6353 
SEQ6354 
SEQ6355 
SEQ6356 
SEQ6357 
SEQ6358 
SEQ6359 
SEQ6360 
SEQ6361 

SEQ6350 
SEQ6351 
SEQ6352 
SEQ6353 
SEQ6354 
SEQ6355 
SEQ6356 
SEQ6357 
SEQ6358 
SEQ6359 
SEQ6360 
SBQ6361 

SEQ6350 
SEQ6351 
SEQ6352 
SEQ6353 
SEQ6354 
SEQ6355 
SEQ6356 
SEQ6357 
SEQ6358 
SEQ6359 
SEQ6360 
SEQ6361 



Table 23: Comparative Sequences relating t SAG0163 
(competence protein Cgl A) 



AVEVNAQDIYIIPKGDCYELYMRIDDERRFIDVFEFNRMASLISHFKFV 

LLNLYYCVFDDERRFIDVFEFNRMASLISHFKFV 

QSLAKQVIHQAVEVNAQDIYIIPKGDCYELYMRIDDERRFIDVFEFNRMASLISHFKFV 

DIYIIPKGDCYELYMRIDDERRFIDVFEFNRMASLISHFKFV 

QSLAKQVIHQAVEVNAQDIYIIPKGDCYELYMRIDDERRFIDVFEFNRMASLISHFKFV 
QSIiAKQVIHQAVEVNAQDIYIIPKGDCYELYMRIDDERRFIDVFEFNRMASLISHFKFV 
QSLAKQVI HQAVEVNAQDI YII PKGDCYELYMRIDDERRFIDVFEFNRMASLI SHFKFV 
-SIAKQVIHQAVEVNAQDIYIIPKGDCYELYMRIDDERRFIDVFEFNRMASLISHFKFV 
QSLAKQVIHQAVEVNAQDIYIIPKGDCYELYMRIDDERRFIDVFEFNRMASLISHFKFV 

TCYETLYAYLMMKRRFI DV FE FN RMASLI SHFKFV 

VEVNAQDIYI I PKGDCYEFYMRI DDERRFIDVFEFNRMAS LI SHFKFV 

VIVMKFYMRIDDERRFIDVFEFNRMASLISHFKFV 



AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
AGMNVGEKRRSQLGS CD YEL SEGRLVSLRLS SVGDYRGQESLVI RI LYS GHQDLKYWFDN 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLS SVGDYRGQESLVI RT LYSGHQDLKYWFDN 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRTLYSGHQDLKYWFDN 
AGMNVGEKRRSQLGSCDYEL SEGRLVSLRLS SVGDYRGQESLV I RTLYS GHQDLKYWFDN 



IKQMKEVLGTRGLYLFSGPVGSGKTTW1YQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
IKQMKEVLGTRGLYLFSGPVGSGKTTLMYQLASEV FKNKQI IT IE DPVE IKN DKMLQLQL 
IKQMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
IKQMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
IKQMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
IKQMKEVLGTRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
IKQMKEVLGTRGLYLFSGPVGSGKTTIJMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
IKQMKEVLGIRGLYLFSGPVGSGKTTUHYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
IKQMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQI IT IEDPVE IKNDKMLQLQL 
IK-MKEVLCARGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
IKQMKEVLCARGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
IK 



EDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSISGVY 
EDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
EDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
EDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
EDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
EDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSISGVY 
EDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSISGVY 

EDIGMTYDALIKLSLRHRPDILIIGEK 

EDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
EDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
EDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 



RLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
RLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETSNFKKHSSDKWNRQVDILAEEGYI 
RLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
RLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
RLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
RLIELGVNYQELENSI*KLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAE1EGHI 
RLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 



RLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDIIAEEGHI 
RLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETSNFKKHSSDKWNRQVDILAEEGHI 
RLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETSNFKKHSSDKWNRQVDILAEEGHI 
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Table 



Comparative Sequences relating to SA<^M£ 
(competence protein CglA) 



SEQ6350 
SEQ6351 
SEQ6352 
SEQ6353 
SEQ6354 
SEQ6355 
SEQ6356 
SEQ6357 
SEQ6358 
SEQ6359 
SEQ6360 
SEQ6361 



KKQAQVEKI I PQETTESSPTF 
KKQAQVEKIIPQETTESSPTF 
KKQAQVEKI I PQETTES SPT F 
KKQAQVRKNYPSRNNGKSNF- 
KKQAQVEKIIPQETTESSPTF 
KKQAQVEKIIPQETTESSPTF 
KKQAQVEKI I PQETTESS PT F 

KKQAQVEKI I PQETTESS PT F 
KKQAQVEKIIPQETTESSPTF 
KKQAQVEKIIPQETTESSPTF 
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SEQ 3D NO. 2101: SAG0290 FROM THE 1169NT1 6BS TYPE V STRAIN (REVERSE 
COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTA 
TCAAAAAGACGGGAAATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACA 
AAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCA 
GCTAATGATTTTTCATACAATAAAGAAAGAGCAGAAAAATATCTCTTCTCAGACCCTATATCCCGTTCAAA 
TTATGCCGTAGTAGGGAAG2\AGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACAGAAG 
TTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATA 
AAAATCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGA 
CTTTATCCTATATGATGCCATTTCATCTGACTATATTGTAAAAGATCAATCATTAAACTTAAGCGTTTCTC 
CTTTGAAAGGTAAAATTGGTAATAATAAGGATGGATTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGT 
AAAACTCT ACAGAAATT T ATAAAT AAGC GT AT T AAAGTT TT GAAAGAAGATGGTACTT TGGCAC GTT T AAG 
T AAACAATAT T TCGGTGG AGATT ACGTTTCAAAC ATTGATAAA 

SEQ ID NO* 2402: SAG0290 FROM THE 18RS21 GBS TYPE II STRAIN (REVERSE 
COMPLEMENT) 

GT AT CAGTTCAGGCGTC AGAGAAAGTAG AACTTAAAGT AGCT AC AGAT T CTGACACGGCACC ATTTACT T A 
TRAAAAAGACGGGAAATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACA 
AAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCA 
GCTAATGATTT TT C ATACAAT AAAGAAAGAGC AGAAAAAT AT CT CT TCTC AGATCCT AT ATCCCGTT CAAA 
TT ATGCCGT AGT AGGGAAGAAGGGGAGCC ATT ACAAATCATT AAGT GACCTCTCT GG AAAAT C AACCG AAG 
TTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATA 
AA7UVTCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGA 
CTTTATCCTATATGATGCCATTTCATCCGACTATATTGTAAAAGACCAATCATTAAACTTAAGCGTTTCTC 
CTT TGAAAGGTAAAAT T GGTAAT AATAAGGATGGACTAGAATACCTCCTTTT ACC AAAAGAT AAAAAAGGT 
AAAACTCTACAGAAATTTATAAATAAGCGTATTAAAGTTTTGAAAGAAAATGGTACTTTGGCACGTTTAAG 
T AAACAAT ATTTCGGTGGAGATTACGTTT CAAAC ATT GAT AAA 

SEQ ID NO. 2403: SAG0290 FROM THE 2603 V/R GBS TYPE V STRAIN (REVERSE 
COMPLEMENT) 

ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGA 
CAGTTCCTTTTGATACTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCA 
T ACAATAAAG AAAGAGC AG AAAAAT AT CT CTTCTCAGATCCT AT ATCC CGTT CAAATTATGCCGTAGT AGG 
GAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACCGAAGTTTTATCTGGCGTTA 
ACT ATGC ACAGGTTCTAGAAAATTGGAAT AAAAATC ATC CTAAT AAAAAACC AAT AAAAAT CAAATATGT T 
TCTGGGAC AACTGGTGTT ACTAGC AGAT T AAAAAAT AT T GAG AGTGGGAAAATT GACTT TATCCT AT ATG A- 
TGCCATTTCATCCGACTATATTGTAAAAGACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAA 
TTGGTAATAATAAGGATGGACTAGAATACCTCCTTTTACCAAAAGATAAAAAAG 

SEQ ID NO. 2404: SAG0290 FROM THE 090 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTA 

TCAAAAAGACGGGAAATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACA 

AAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCA 

GCTAATGATTTTTCATACAATAAAGAT^AGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAA 

TTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACCGAAG 

TT TT ATCTGGC GT TAACTATGC AC AGGT TCT AGAAAAT T GGAAT AAAAATC AT CCT AATAAAAAACC AAT A 

AAAATCAAATATGTTTCTGGGACAACTGGTGTT ACT AGC AGAT TAAAAAATATTGAGAGTGGGAAAATTGA 

CTTTATCCTATATGATGCCATTTCATCCGACTATATTGTTU^AGACCAATCATTAAACTTAAGCGTTTCTCCT 

TTGAAAGGTAAAATTGGTAATAATAAGGATGGACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTA 

CAGAAATTTATAAATAAGCGTATTAAAGTTTTGAAAGAAAATGGTACTTTGGCACGTTTAAGTAAACAATATTTCGGT 

GGAGATTACGTTTCAAACATTGATAAA 
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SEQ ID NO. 2405: SAG0290 FROM THE A909 GBS TOPE la STRAIN (REVERSE 
COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTA 
TCAAAAAGACGGGAAATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACA 
AAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCA 
GCT AATGAT T TTTCAT ACAAT AAAGAAAG AGCAGAAAAAT ATCT CTTCTCAGATCCT ATAT CCCGTT C AAA 
TTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACCGAAG 
TTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATNNTAATAAA7VAACCANTA 
AAAATNAAAT ATGTT T CTGGG AC AACTGGT GT TACT AGCAGATT AAAAAATATTGAG AGTGGGAAAATTG A 
CTTTATCCTATATGATGCCATTTCATCCGACTATATTGTAAAAGACCAATCATTAAACTTAAGCGTTTCTC 
CTTTGAAAGGTAAAATTGGTAATAATAAGGATGGACTAGAATACCTCCTTTTACCT^AAAGATAAAAAAGGT 
AAAACTCTACAGAAATTTATAAATAAGCGT 

SEQ ID NO. 2406: SAG0290 FROM THE CJB110 GBS NONTYPEABIJE STRAIN (REVERSE 
COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTA 
TCAAAAAGACGGGAAATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACA 
AAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCA 
GCTAATGATTTTTCATACAATAAAGAAAGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAA 
TTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACCGAAG 
TTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATA 
AAAATCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAA7\AATATTGAGAGTGGGAAAATTGA 
CTTT ATCCT AT ATGATGCCATTT CATCCGACT ATATTGT AAAAGAC CAATC AT TAAACTT AAGCGT TT CT C 
CTTTGAAAGGTAAAATTGGTAATAATAAGGATGGACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGT 
A7^ACTCTACAGAAATTTATAAATAAGCGTATTAAAGTTTTGAAAGAAAATGGTACTTTGGCACGTTTAAG 
TAAACAATATTTCGGTGGAGATTACGTTTCAAACATTGATAAA 

SEQ ID NO. 2407: SAG0290 FROM THE COH1 GBS TYPE III STRAIN (REVERSE 
COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTA 
TCAAAAAGACGGGAAATTCAAAGGTTATGACGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACA 
2\AGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCA 
GCTAATGATTTTTCATATAATAAAGAAAGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAA 
T T ATG CCGTAGTAGGGAAGAAGGGGAGCCATTAC AAATCATT AAGT G ACCT CTCTGGAAAAT CAAC AGAAG 
TTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATA 
AAAATCAAAT AT GTT TCTGGGAC AACTGGTGTT ACTAGCAGAT T AAAAAAT AT TGAGAGTGGAAAAAT TGA 
CTTTATCCTATATGATGCCATTTCATCTGACTATATTGTAAAAGATCAATCATTAAACTTAAGCGTTTCTC 
CTTTGAAAGGTAAAATTGGTAATAATAAGGATGGATTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGT 
AAAACTCT AC AGAAATTT AT AAAT AAGCGT ATTAAAGTTTTGAAAGAAGATGGTACTTTGGCACGTTTAAG 
TAAACAATAT T TCGGTGGAGATT AC GTT TCAAACAT TGATAAA 

SEQ ID NO. 2408: SAG0290 FROM THE H36b GBS TYPE lb STRAIN 
(REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTA 
TCAAAAAGACGGGAAATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACA 
AAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCA 
GCTAATGATTTTTCATACAATAAAGAAAGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAA 
TT ATGCCGT AGT AGGGAAGAAGGGGAGCC ATTACAAATC ATTAAGTGACCT CTCTGGAAAATC AAC CGAAG 
TTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATA 
AAAATCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGA 
CTTTATCCTATATGATGCCATTTCATCCGACTATATTGTAAAAGACCAATCATTAAACTTAAGCGTTTCTC 
CTTTGAAAGGTAAAATTGGTAATAATAAGGATGGACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGT 
AAAACTCTACAGAAATTTATAAATAAGCGTATTAAAGTTTTGAAAGAAAATGGTACTTTGGCACGTTTAAG 
TAAACAATATTTCGGTGGAGATTACGTTTCAAACATTGATAAA 
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• Comparative Sequences relating to SAG0290 
(ABC transp rter, substrate-binding protein) 





SEQ ID NO. 2409: SA60290 FROM THE JM9130013 GBS STRAIN VIII (REVERSE 
COMPLEMENT ) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTA 
TCAAAAAGACGGGAAATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACA 
AAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCA 
GCT AATG ATT TT T CAT ACAATAAAGAAAGAGC AG AAAAAT AT CTCTT CT C AGATCCTAT AT CCCGTTCAAA 
TTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACCGAAG 
TTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATA 
AAAATCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGA 
CTTTATCCTATATGATGCCATTTCATCCGACTATATTGTAAAAGACCAATCATTAAACTTAAGCGTTTCTC 
CTTTGAAAGGTAAAATTGGTAATAAT2VAGGATGGACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGT 
AAAACT CTAC AGAAATTTAT AAAT AAGC GT AAT AAAGT TTT GAAAGAAAATGGT A 

SEQ ID NO. 2410: SAG0290 FROM THE M732 GBS TYPE III STRAIN (REVERSE 
COMPLEMENT) 

GTATCAGTTC AGGCGT CAGAGAAAGT AGAACTT AAAGT AGCT ACAGATT CT G AC ACGGCACC ATTTACTT A 
TCAAAAAGACGGGAAATTCAAAGGTTATGACGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACA 
AAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCA 
GCTAATGATTTTTCATATAATAAAGAAAGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAA 
TTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACAGAAG 
TTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATA 
AAAATCAAAT ATGT TT CTGGG AC AACTGGT GT TAC TAGCAGATTAAAAAATATTGAGAGT GGAAAAATTG A 
CT T TATCCT AT ATGATGCC AT TTC ATCTGACT ATATT GT AAAAGAT CAAT C ATTAAACTT AAGCGTT TCTC 
CT T TGAAAGGTAAAATT GGTAATAATAAG GATGGATTAGAAT ACCT CCT T T T ACCAAAAGATAAAAAAGGT 
AAAACTCTACAGAAATTTATAAATAAGCGTATTAAAGTTTTGAAAGAAGATGGTACTTTGGCACGTTTAAG 
TAAACAATATTTCGGTGGAGATTACGTTTCAAACATTGATAAA 

SEQ ID NO. 2411: SAG0290 FROM THE M781 GBS TYPE III STRAIN (REVERSE 
COMPLEMENT) 

GTATC AGTTCAGGC GTC AGAGAAAGTAGAACT TAAAGTAGCT ACAGATT CTGACACGGCACCATTT ACT TA 
TCAAAAAGACGGGAAATTCAAAGGTTATGACGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACA 
AAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCA 
GCTAATGATT TTTCAT AT AAT AAAGAAAGAGCAGAAAAAT ATCTC T TCTC AGATCCTAT AT CCCGT T C AAA 
TTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACAGAAG 
TTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATA 
AAAAT C AAATAT GTT TCTGGG ACAACT GGT GTTACT AGCAGAT T AAAAAATATT GAG AGTGGAAAAAT T GA 
CTTTATCCTATATGATGCCATTTCATCTGACTATATTGTAAAAGATCAATCATTAAACTTAAGCGTTTCTC 
CT T TGAAAGGTAAAATTGGTAAT AAT AAGGATGGATTAGAAT ACCT CCTTTTAC C AAAAGAT AAAAAAG GT 
AAAACTCT ACAGAAATTTATAAAT AAGCGT AT T AAAGTTTTGAAAGAAGATGGTACT TTGGC ACGT T T AAG 
TAAACAATATTTCGGTGGAGATTACGTTTCAAACATTGATAAA 

SEQ2401 TATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCA 
SEQ2402 TATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCA 



SEQ2403 
SEQ2404 
SEQ2405 
SEQ2406 
SEQ2407 
SEQ2408 
SEQ2409 
SEQ2410 
SEQ2411 



TATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCA 
TATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCA 
TATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCA 
TATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCA 
TATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCA 
TATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCA 
TATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCA 
TATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCA 



SEQ2401 
SEQ2402 
SEQ2403 
SEQ2404 
SEQ2405 
SEQ2406 
SEQ2407 
SEQ2408 
SEQ2409 



CATTTACTTATCAAAAAGACGGGAAATTCAAAGGTTATGATGTTGATGTTGTCAAAGCT 
CATTTACTTATRAAAAAGACGGGAAATTCAAAGGTTATGATGTTGATGTTGTCAAAGCT 



CATTTACTTATCAAAAAGACGGGAAATTCAAAGGTTATGATGTTGATGTTGTCAAAGCT 
CATTTACTTATCAAAAAGACGGGAAATTCAAAGGTTATGATGTTGATGTTGTCAAAGCT 
CATTTACTTATCAAAAAGACGGGAAATTCAAAGGTTATGATGTTGATGTTGTCAAAGCT 
CATTTACTTATCAAAAAGACGGGAAATTCAAAGGTTATGACGTTGATGTTGTCAAAGCT 
CATTTACTTATCAAAAAGACGGGAAATTCAAAGGTTATGATGTTGATGTTGTCAAAGCT 
CATTTACTTATCAAAAAGACGGGAAATTCAZ^AGGTTATGATGTTGATGTTGTCAAAGCT 



ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCT 
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Figure zT: Comparative Sequences relating to SA< 
(ABC transporter, substrate-binding protein) 

CATTTACTTATCAAAAAGACGGGAAATTCAAAGGTTATGACGTTGATGTTGTCAAAGCT 
CATTTACTTATCAAAAAGACGGGAAATTCAAAGGTTATGACGTTGATGTTGTCAAAGCT 

GTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCA 
GTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCA 
GTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCA 
GTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCA 
GTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCA 
GTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCA 
GTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCA 
GTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCA 
GTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCA 
GTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCA 
GTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATACTATTTCA 

ACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCATACAATAAAGAA 
ACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCATACAATAAAGAA 
ACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCATACAATAAAGAA 
ACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCATACAATAAAGAA 
ACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCATACAATAAAGAA 
ACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCATACAATAAAGAA 
ACAGGTATTGATGCAGGGAAATTTGATTTATCAGCT7UVTGATTTTTCATATAATAAAGAA 
ACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCATACAATAAAGAA 
ACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCATACAATAAAGAA 
ACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCATATAATAAAGAA 
ACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCATATAATAAAGAA 

AGAGCAGAAAAATATCTCTTCTCAGACCCTATATCCCGTTCAAATTATGCCGTAGTAGGG 
AGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGG 
AGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGG 
AGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGG 
AGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGG 
AGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGG 
AGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGG 
AGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGG 
AGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGG 
AGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGG 
AGAGCAGAAAAATATCTCTTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGG 

AAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACAGAAGTTTTA 
AAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACCGAAGTTTTA 
AAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACCGAAGTTTTA 
AAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACCGAAGTTTTA 
AAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACCGAAGTTTTA 
AAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACCGAAGTTTTA 
AAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACAGAAGTTTTA 
AAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACCGAAGTTTTA 
AAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACCGAAGTTTTA 
AAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACAGAAGTTTTA 
AAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAAATCAACAGAAGTTTTA 

TCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAA 
TCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAA 
TCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAA 
TCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAA 
TCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATNNTAATAAAAAA 
TCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAA 
TCTGGCGTTAACTATGCACAGGTTCTAG7VAAATTGGAATAAAAATCATCCTAATAAAAAA 
TCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAA 
TCTGGCGTTAACTATGCAC^GGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAA 
TCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAA 
TCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAA 

CCAATAAAAATCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATT 
CCAATAAAAATCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATT 
CCAATAAAAATCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATT 
CCAATAAAAATCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATT 
CCANTAAAAATNAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATT 
CCAATAAAAATCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATT 
CCAATAAAAATCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATT 
CCAATAAAAATCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATT 
CCAATAAAAATCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATT 
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Figure 25: C mparative Sequences relating to SA 
(ABC transporter, substrate-binding protein) 

CCAATAAAAATCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATT 
CC^TAAAAATCAAATATGTTTC?rGGGACAACTGGTGTTACTAGCAGATTAAAAAATATT 

GAGAGTGGGAAAATTGACTTTATCCTATATGATGCCATTTCATCTGACTATATTGTAAAA 
GAGAGTGGGAAAATTGACTTTATCCTATATGATGCCATTTCATCCGACTATATTGTAAAA 
GAGAGTGGGAAAATTGACTTTATCCTATATGATGCCATTTCATCCGACTATATTGTAAAA 
GAGAGTGGGAAAATTGACTTTATCCTATATGATGCCATTTCATCCGACTATATTGTAAAA 
GAGAGTGGG7VAAATTGACTTTATCCTATATGATGCCATTTCATCCGACTATATTGTAAAA 
GAGAGTGGGAAAATTGACTTTATCCTATATGATGCCATTTCATCCGACTATATTGTAAAA 
GAGAGTGGAAAAATTGACTTTATCCTATATGATGCCATTTCATCTGACTATATTGTAAAA 
GAGAGTGGGAAAATTGACTTTATCCTATATGATGCCATTTCATCCGACTATATTGTAAAA 
GAGAGTGGGAAAATTGACTTTATCCTATATGATGCCATTTCATCCGACTATATTGTAAAA 
GAGAGTGGAAAAATTGACTTTATCCTATATGATGCCATTTCATCTGACTATATTGTAAAA 
GAGAGTGGAAAAATTGACTTTATCCTATATGATGCCATTTCATCTGACTATATTGTAAAA 

GATCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGAT 
GACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGAT 
GACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGAT 
GACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGAT 
GACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGAT 
GACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGAT 
GATCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGAT 
GACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGAT 
GACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGAT 
GATCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGAT 
GATCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGAT 

GGATTAGAATACCTCCTTTTACCAT^AAGATAAAAAAGGTAAAACTCTACAGAAATTTATA 
GGACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAATUVCTCTACAGAAATTTATA 

GGACTAGAATACCTCCTTTTACCAAAAGATAAAAAAG 

GGACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATA 
GGACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATA 
GGACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATA 
GGATTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATA 
GGACTAG7\ATACCTCCTTTTACCA2\AAGATAAAAAAGGTAAAACTCTACAGAAATTTATA 
GGACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATA 
GGATTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATA 
GGATTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATA 

ATAAGCGTATTAAAGTTTTGAAAGAAGATGGTACTTTGGCACGTTTAAGTAAACAATAT 
ATAAGCGTATTAAAGTTTTGAAAGAAAATGGTACTTTGGCACGTTTAAGTAAACAATAT 

ATAAGCGTATTAAAGTTTTGAAAGAAAATGGTACTTTGGCACGTTTAAGTAAACAATAT 

ATAAGCGT 

ATAAGCGTATTAAAGTTTTGAAAGAAAATGGTACTTTGGCACGTTTAAGT7\AACAATAT 
ATAAGCGTATTAAAGTTTTGAAAGAAGATGGTACTTTGGCACGTTTAAGTAAACAATAT 
ATAAGCGTATTAAAGTTTTGAAAGAAAATGGTACTTTGGCACGTTTAAGTAAACAATAT 

ATAAGCGTAATAAAGTTTTGAAAGAAAATGGTA 

ATAAGCGTATTAAAGTTTTGAAAGAAGATGGTACTTTGGCACGTTTAAGTAAACAATAT 
ATAAGCGTATTAAAGTTTTGAAAGAAGATGGTACTTTGGCACGTTTAAGTAAACAATAT 

TCGGTGGAGATTACGTTTCAAACATTGATAAA 

TCGGTGGAGATTACGTTTCAAACATTGATAAA 

TCGGTGGAGATTACGTTTCAAACATTGATAAA 

TCGGTGGAGATTACGTTTCAAACATTGATAAA 

TCGGTGGAGATTACGTTTCAAACATTGATAAA 

TCGGTGGAGATTACGTTTCAAACATTGATAAA 

TCGGTGGAGATTACGTTTCAAACATTGATAAA 

TCGGTGGAGATTACGTTTCAAACATTGATAAAGTRCMARATVSTNCSRATNGTSAGABC 
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Figure C mparative Sequences relating to SAt^ 
(ABC transporter, substrate-binding protein) 



SEQ2401 

SEQ2402 

SEQ2403 

SEQ2404 

SEQ2405 

SEQ2406 

SEQ2407 

SEQ2408 

SEQ2409 

SEQ2410 

SEQ2411 RANSRTR5TBSTRATBN DNGRTN 



>SEQ ID NO 2450: 8_1169NT frame: 1 

VSVQASEKVEI^ATDSDTAPFTYQKDGKFKGYDVDVVKAVFKGSKYKVTFOTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAVVGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVT S RLKN I E S GKI DFILYDAIS SDYIVK 
DQSLNLSVS PLKGKI GNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ZD NO 2451:8_18RS21 frame: 1 

VSVQASEKVELKVATDSDTAPFTYXKDGKFKGYDVDWKAVFKGSKYBCVTFKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAVVGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKN IE SGKI DFILYDAI S S DY IVK 
DQSLNLSVSPLKGKIGNNKDGI£YLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2452:8_J2603 frame: 2 

FKGYDVDWKAVFKGSKYKVTFKTVPFDTI STG I DAGKFDL SAND FS YNKERAEKYLFS D 
PISRSNYAWGKKGSHYKSLSDLSGKSTEVLSGVNYAQVLENWNKNHPNKKPIKIKYVSG 
TTGVTSRLKNIESGKIDFILYDAISSDYIVKDQSLNLSVSPLKGKIGNNKDGLEYLLLPK 
DKK 

>SEQ ID NO 2453:8_090 frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDWKAVFKGSKYKVTFKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKI DFILYDAIS SDYIVK 
DQS LNLSVS PLKGKI GNNKDGLE YLLLPKDKKGKTLQKFINKRIKVLKENGTLARLS KQY 
FGGDYVSNIDK 

>SEQ ID NO 2454:8_A909 frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDVVKAVFKGSKYKVTFKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHXNKKPXKXKYVSGTTGVTSRLKNIESGKI DFILYDAIS SDYIVK 
DQSLNLS VS PLKGKI GNNKDGLE YLLL PKDKKGKTLQKFINKR 

>SEQ ID NO 2455: 8_CJB110 frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDWKAVFKGSKYKVTFKTVPFDTIS 
TGI DAGKFDLSANDFS YNKERAEKYLFS DPI SRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKI DFILYDAIS SDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2456: 8jCOHl frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDVVKAVFKGSKYKVTFKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKI DFILYDAIS SDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2457:8_H36B frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDVVKAVFKGSKYKVTFKTVPFDTIS 
TGI DAGKFDLSANDFSYNKERAEKYLFS DPI SRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 
FGGDYVSNIDK 



Figure 




>SEQ ID NO 2458 : 8_JM9130013 frame: 1 

VSVQASEKVELKVAT DS DTAPFT YQKDGKFKGYDVDWKAVFKGSKYKVT FKTVPFDTI S 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRNKVLKENG 

>SEQ ID NO 2459:8_M732 frame: 1 

VSVQASEKVELKVAT DS DTAPETYQKDGKFKGYDVDVVKAVFKGSKYKVT FKTVPFDTI S 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 
FGGDYVSN I DK 

>SEQ ID NO 2460:8_M781 frame: 1 

VSVQASEKVELKVAT DS DTAPFT YQKDGKFKGYDVDWKAVFKGSKYKVT FKTVPFDT I S 
TGI DAGKFDLSANDFSYNKERAEKYLFSDPI SRSNYAWGKKGSHYKSLS DLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 
FGGDYVSN I DK 



SEQ2450 SVQASEKVELKVATDS DTAP FTYQKDGKFKGYDVDWKAVFKGSKYKVT FKTVP FDTI S 

SEQ2451 SVQASEKVELKVATDS DTAPFT YXKDGKFKGYDVDWKAVFKGSKYKVT FKTVPFDT I S 

SEQ2452 FKG YD VDWKAVFKG SKYKVT FKT V PF DT I S 

SEQ2453 SVQASEKVELHCVATDSDTAPFTYQKDGKFKGYDVDWKAVFKGSKYKVTFKTVPFDTIS 

SEQ2454 SVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDVVKAVFKGSKYKVTFKTVPFDTIS 

SEQ245S SVQASEKVELKVATDS DTAP FT YQKDGKFKGYDVDWKAVFKG S KYKVTFKTVPFDTI S 

SEQ2456 SVQASEKVELKVAT DS DTAP FTYQKDGKFKGYDVDWKAVFKGSKYKVT FKTVPFDTI S 

SEQ2457 SVQASEKVELKVATDS DTAP FTYQKDGKFKGYDVDVVKAVFKGSKYKVT FKTVPFDTI S 

SEQ2458 SVQASEKVELKVATDSDTAP FTYQKDGKFKGYDVDVVKAVFKGSKYKVT FKTVPFDTI S 

SEQ2459 SVQASEKVELKVATDS DTAP FTYQKDGKFKGYDVDWKAVFKGSKYKVT FKTVPFDT I S 

SEQ2460 SVQASEKVELKVAT DS DTAP FTYQKDGKFKGYDVDWKAVFKGSKYKVT FKTVP FDT I S 

SEQ2450 TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAVVGKKGSHYKSLSDLSGKSTEVL 

SEQ2451 TGI DAGKFDLSAN DFSYNKERAEKYLFS DPI SRSN YAWGKKGSHYKS LS DLSGKSTEVL 

SEQ2452 TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 

SEQ2453 TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 

SEQ2454 TGI DAGKFDLSAN DFSYNKERAEKYLFS DPI SRSNYAWGKKGSHYKSLS DLSGKSTEVL 

SEQ2455 TGI DAGKFDLSANDFSYNKERAEKYLFS DPI SRSNYAWGKKGSHYKSLS DLSGKSTEVL 

SEQ2456 TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 

SEQ2457 TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 

SEQ2458 TGI DAGKFDLSAN DFSYNKERAEKYLFSDP I SRSNYAWGKKGSHYKSLS DLSGKSTEVL 

SEQ2459 TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 

SEQ2460 TGI DAGKFDLSANDFSYNKERAEKYLFSDPI SRSNYAWGKKGSHYKSLS DLSGKSTEVL 

SEQ2450 SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 

SEQ2451 SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 

SEQ2452 SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 

SEQ2453 SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 

SEQ2454 SGVNYAQVLENWNKNHXNKKPXKXKYVSGTTGVT SRLKNIE SGKI DFILYDAI S S DYI VK 

SEQ2455 SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 

SEQ2456 SGVNYAQVLENWNKNHPNKKPIKIKWSGTTGVTSRIJKNIESGKIDFILYDAISSDYIVK 

SEQ2457 SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 

SEQ2458 SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 

SEQ2459 SGVNYAQVI^NWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 

SEQ2460 SGVNYAQVI*ENVWKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYrVK 

SEQ2450 DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 

SEQ2451 DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 

SEQ2452 DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKK 

SBQ2453 DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 

SBQ2454 DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKR 

SEQ2455 DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 

SEQ2456 DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 

SEQ2457 DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 

SEQ2458 DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRNKVLKENG 

SEQ2459 DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 

SEQ2460 DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 
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C mparative Sequences relating to SAGra90 
(ABC transp rter, substrate-binding protein) 



GGDYVSNIDK 
GGDYVSNIDK 

GGDYVSNIDK 

GGDYVSNIDK 
GGDYVSNIDK 
GGDYVSNIDK 

GGDYVSNIDK 
GGDYVSNIDK 
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&OH42&&tm .082605 




Table 25T< Comparative Sequences relating to SA* 
(protein of unknown function) 

SEQ ID NO. 2501: SAG0368 FROM THE 090 GBS TYPE la STRAIN 

TATAATTTTTCGACTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCTATTGAA 
GAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAAT 
AGCGATTCTATGATCTTAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTG 
ATTAAATTGAGTGGTCCCAAAAATAATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGT 
GCGGAAATGGCATTGATGACTGTTCAAGACTTATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGA 
TTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAAT 
GAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATG 
CGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAAAGTCCTTAAAAAAATA 
TTGGCGTT AAATAGTATT AGTT CATACAAAAAAAT TCT TTCCGCAGTAAGTAAT AACATGCAAACTAATATTGAGAT A 
TCATCAAAAACGATTCCT/yVTTTGTTAGCTTAT7VAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAGGGTGAA 
GACGCTACTTTATCAGATGGTGGCTCTTATCAAATTTTAACTAAG7^AACATCTACTTGCAGTTCAAAATAGAATTAAG 
AAAG/^ACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCT 
AGTAATGATTCTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGT 
TACAGTGGTAATACTACTTATAGTTCTGAGACTAAT CAAACAACT CAT CAAAATT ACTATAATAGTAGCACTCCTGCT 
AGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACG 
CCTAATCCA 

SEQ ID NO. 2502: SAG0368 FROM THE 1169NT1 GBS TYPE V STRAIN 

TATAATTTTTCGACTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCTATTGAA 
GAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTTGGTCAGGAAA 
TAGCGATTCTATGATCTTAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATT 
GATTAAATTGAGTGGTCCCAAAAATAATGGACAGACTGGCGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGG 
TGCGGAAATGGCAT TGATGACTGTT CAAGACTTATTAGATAT TAATGTTGATTACTTTATGCAAAT TAAT ATGCAAGG 
ATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAA 
TGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTAT 
GCGCTATGATGATCCAGAGGGAGAT TATGGGCGTCAAAAAAGACAACGT GAAGTAATTCAAAAAGTCCTTAAAAAAAT 
ATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGCAAACTAATATTGAGAT 
ATCATCAAAAACGATTCCTAATTTGT T AGCTTATAAAG AT TCAT TGGAACATATTAAAT CTTATCAGT TGAAAGGTGA 
AGACGCTACTTTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAA 
GAT^AGAACTAGATAAAAAGCGTAGTAA/^ACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGC 
TAGTAATGATTCTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAG 
TTACAGTGGTAATACTACTTATAGTTCTGAGACTAATCAAACAACTCATCAAAGTTACTATAATAGTAGCACTCCTGC 
TAATAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAATGGGGCTGCAAC 
GCCTAATCCA 



SEQ ID NO. 2503 SAG0368 FROM THE 18RS21 GBS TYPE XI STRAIN 

TATAATTTT TCGACTAATGAATTGTCTAAGACTTTTAAAGATT T TAAGCTAGCTAAATCAAAAAGTCATGCTAT TGAA 
GAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAAT 
AGCGATTCTATGATCTTAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTG 
ATTAAATTGAGTGGTCCCAAAAATAATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGT 
GCGGAAATGGCATTGATGACTGTTCAAGACTTATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGA 
TTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAAT 
GAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATG 
CGCTATGATGATCCAGAGGGAGATTATGGGCGTC7^A7\AAAGACAACGTGAAGTAATTCAAAAAGTCCTTAAAAAAATA 
TTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAAGATGCAAACTAATATTGAGATA 
TCATCAAAAACGATTCCTAATTTGTTAGCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAGGGTGAA 
GACGCTACTTTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAT^ATAGAATTAAG 
AAAGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCT 
AGTAATGATTCTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGT 
TACAGTGGTAATACTACTTATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCT 
AGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACG 
CCTAATCCA 
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Table 25: Comparativ Sequences relating t SAGO: 
(pr tein f unknown function) 




0S2602 



SEQ ID NO. 2504: SAG0368 FROM THE 2603 V/R GBS TYPE V STRAIN 

TATAATTTTTCGACTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCTATTGAA 
GAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAAT 
AGCGATTCTATGATCTTAGTCACTATAAATCCTA7\AACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTG 
ATTAAATTGAGTGGTCCCAAAAATAATGGACAGACTGGAGTAG7\AGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGT 
GCGGAAATGGCATTGATGACTGTTCAAGACTTATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGA 
TTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAAT 
GAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATG 
CGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAAAGTCCTTAAAAAAATA 
TTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGCAAACTAATATTGAGATA 
TCATCAAAAACGATTCCTAATTTGTTAGCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAGGGTGAA 
GACGCTACTTTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAG 
AAAGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCT 
AGTAATGATTCTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGT 
TACAGTGGTAATACTACTTATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCT 
AGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACG 
CCTAATCCA 

SEQ ID NO. 2505: SAG0368 FROM THE A909 GBS TYPE la STRAIN 

TATAATTTTTCGACTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCTATTGAA 
GAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAAT 
AGCGATTCTATGATCTTAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTG 
ATTAAATTGAGTGGTCCCAAAAATAATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGT 
GCGG AAATGGC AT TGAT GACTGTTCAAGACTTATT AGATAT TAATGTT GATTACTTTATGCAAATT AATATGCAAGG A 
TTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAAT 
GAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATG 
CGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAAAGTCCTTAAAAAAATA 
TTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGC^AACTAATATTGAGATA 
TCATCAAAAACGATTCCTAATT TGT TAGCTTATAAAGAT TCAT TGGAACAT AT TAAATCTTATCAGT TGAAGGGTGAA 
GACGCTACTTTATCAGATGGTGGCTCTTATCAAATTTTAACTAAG7\AACATCTACTTGCAGTTCAAAATAGAATTAAG 
AAAGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCT 
AGTAATGATTCTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGT 
TACAGTGGTAATACTACTTATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCT 
AGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACG 
CCTAATCCA 

SEQ ID NO. 2506: SAG0368 FROM THE CJB110 GBS NONTYPEABU3 STRAIN (REVERSE 
COMPLEMENT) 

TATAATTTTTCGACTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCTATTGAA 
GAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAAT 
AGCGATTCTATGATCTTAGTCACTATAAATCCTAAAACT/^ATAAAACAACGATGACAAGCTTAGAACGTGACGTATTG 
ATTAAAT TGAGTGGTCCCAAAAATAATGGACAGACTGGAGTAGAAGCAAAGCT AAATGCAGCCTATGCTT CTGGTGGT 
GCGGAAATGGCATTGATGACTGTTCAAGACTTATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGA 
TTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAAT 
GAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATG 
CGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACiVACGTGAAGTAATTCAT^AAAGTCCTTAAAAAAATA 
TTGGCGTTAAATAGTAT TAGTTCAT ACAAAAAAAT T CTT TCCGCAGTAAGTAATAACATGCAAACTAAT ATT GAGAT A 
TCATCAAAAACGATTCCTAATTTGTTAGCTTATAAAGATTCATTGGAACATATTT^AATCTTATCAGTTGAAGGGTGAA 
GACGCTACTTTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAG 
AAAGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCT 
AGTAATGATTCTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGT 
TACAGTGGTAATACTACTTATTAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGC 
TAGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAAC 
GCCTAATCCA 
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Table 




C mparative Sequences relating to SA< 
(protein of unknown function) 




SEQ ID NO. 2507: SAG0368 FROM THE COHl GBS TYPE III STRAIN (REVERSE 
COMPLEMENT ) 

GATTTTAAGCTAGATAAATCAAAAAGTCATGCTATTGAAGAAACAAAGCCGTTTTCAATACTATTAATGGGTGTGGAC 
ACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAATAGCGATTCTATGATCTTAGTCACTATAAATCCTAAAACT 
AATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAATAATGGACAGACTGGC 
GTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTTATTAGAT 
ATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTGGTCAATGCTGTTGGTGGTATAACAGTA 
ACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACAT 
AAAATAAATGGAGAACAAGCACT TGT T TATTCTCGTATGCGCTATGATG ATCCAGAGGGAGATTATGGGCGTCAAAAA 
AGACAACGTGAAGTAATTCAAAAAGTCCT TAAAAAAATATTGGCGT TAAATAGT AT T AGTTCATACAAAAAAATTCTT 
TCCGCAGTAAGTAATAACATGCAAACTAATATTGAGATATCATC7^AAAACGATTCCTAATTTGTTAGCTTATAAAGAT 
TCATTGGAACATATTAAATCTTATCAGTTGAAGGGTGAAGACGCTACTCTATCAGATGGTGGCTCTTATCAAATTTTA 
ACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAGAAAGAGCTGGATAAAAAGCGTAGTAAAACTCTGAAGACA 
AGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATTCTTCTACTTATTCATCAACACAAGAGAAT 
TATTATTATACAACACCCTTATTCAG/^AGCACCACCAAGTTACAGTGGTAATACTACTTATAGTTCTGAGACTAATCA 
AACAACTCATCAAAGTTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGA 
TTCAAGTGGAAGTGTTAAT7VATTATAACGGGGCTGCAACGCCTAATCCAAACACAGGAACGCAACCAGTACCAGGTCA 
AACTAATCCA 

SEQ ID NO. 2508: SAG0368 FROM THE H36b GBS TYPE lb STRAIN 

TATAATTTTTCGACTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCTATTGAA 
GAAAO^GCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAAT 
AGCGATTCTATGATCTTAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTG 
ATTAAATTGAGTGGTCCCAA2\AATAATGGACAGACTGGAGTAGAAGCAAAGCTA7^ATGCAGCCTATGCTTCTGGTGGT 
GCGGAAATGGCATTGATGACTGTTCAAGACTTATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGA 
TTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAAT 
GAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATG 
CGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAAAGTCCTTAAAAAAATA 
TTGGCGTTAAATAGTA 

SEQ ID NO. 2509: SAG0368 FROM THE ????? 

TTAGTTCATACAAAAAAAT TCTTTCCGCAGT AAGTAATAACATGCAAACTAATATT GAGATATCATCAAAAACGATTC 
CTAATTTGTTAGCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAGGGTGAAGACGCTACTTTATCAG 
ATGGTGGCTCTTATCAAAT T TTAACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAGAAAGAACTGGATAAAA 
AGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATTCTTCTA 
CTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATACTA 
CTTATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTA 
ACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2510: SAG0368 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE 
COMPLEMENT) 

TATAATTTTTCGACTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCTATTGZVA 
GAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAAT 
AGCGATTCTATGATCTTAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTG 
ATTAAATTGAGTGGTCCCAAAAATAATGGACAGACTGGAGTAGAAGC7UVAGCTAAATGCAGCCTATGCTTCTGGTGGT 
GCGGAAATGGCATTGATGACTGT TCAAGACTTAT TAGATATTAATGTTGATTACT TT ATGCAAATTAAT ATGC AAGGA 
TTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAAT 
GAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAATGGAGAACAAgCACTTGTTTATTCTCGTATG 
CGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAAAGTCCTTAAAAAAATA 
TTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGCAAACTAATATTGAGATA 
TCATCAAAAACGATTCCTAATTTGTTAGCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAGGGTGAA 
GACGCTACTTTATC^GATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAG 
AAAGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCT 
AGTAATGAT T CTTCTACT TATTCATCAACACAAGAGAATAATT ATAATACAACACCTTATTCAG AAGCACCACCAAGT 
TACAGTGGTAATACTACTTATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCT 
AGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACG 
CCTAATCCA 
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Table 2s: Comparative Sequences relating t 
(protein of unknown functi n) 



sagKb 



SEQ ID NO. 2511: SAG0368 FROM THE M781 GBS TYPE IXX STRAIN (REVERSE 
COMPLEMENT) 

TTCAATACTATTAATGGGTGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAATAGCGATTCTATGAT 
CTTAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGG 
TCCCAAAAATAATGGACAGACTGGCGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATT 
GATGACTGT TCAAGACTT ATTAGATATTAATGT TGAT TACTTTAT GCAAAT TAATATGCAAGGATTAGTTG AT TTGGT 
CAATGCTGTTGGTGGTATAACAGTAACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAA 
GGCTGTTGTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCC 
AGAGGGAGATT ATGGGCGT CAAAAAAGACAACGTGAAGTAATTCAAAAAGTCCT TAAAAAAATAT TGGCGTTAAATAG 
TATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGCAAACTAATATTGAGATATCATCAAAAACGAT 
TCCTAATTTGT TAGCTTATAAAGATTCATTGGAACATATTAAAT CT TATCAGTTGAAGGGT GAAGACGCTACTCTATC 
AGATGGTGGCTCT TATCAAAT TTTAAC TAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAGAAAGAGCTGGATAA 
AAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATTCTTC 
TACT TATTCATCAACACAAGAGAATAAT TATAAT ACAACACCTTAT TCAGAAGCACCACCAAGTTACAGTGGTAATAC 
TACTTATAGTTCTGAGACTAATCAAACAACTCATCAAAGTTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAG 
TAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTTAATAATTATAACGGGGCTGCAACGCCTAATCCAAACAC 
• AGGAACGCAACCAGTACCAGGTCAAACTAATCCA 

SEQ2501 

SEQ2502 

SEQ2503 

SEQ2504 

SEQ2505 

SEQ2506 

SEQ2507 ATTTTAAGCTAGATAAATCAAAAAGTCATGCTATTGAAGAAACAAAGCCGTTTTCAATA 

SEQ2508 

SEQ2509 

SEQ2510 

SEQ2511 TTCAATA 



SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 



TATTAATGGGT GTGGACACAGGTT CAGAGCATCGAAAATCTAAGTGGTCAGGAAATAGC 



TATTAATGGGTGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAATAGC 



SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 



ATTCTATGATCTTAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTA 



ATTCTATGATCTTAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTA 
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Table 



C mparative Sequences relating to SAGra68 
(prot in f unknown function) 



SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
. SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
5EQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 



AACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAATAATGGACAGACTGGCGTAGAA 



AACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAATAATGGACAGACTGGCGTAGAA 



CAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAA 



CAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAA 



ACTTATT AGATATTAATGTTGATTACT TTATGCAAAT TAATATGC AAGGAT T AGTTGAT 



ACT TATTAGATAT TAATGTTGATTACTTTATGC AAATTAATATGCAAGGATTAGTTGAT 



TGGTCAATGCTGTTGGTGGTAT7VACAGTAACTAATAAATTTGACTTTCCAATATCAATT 



TGGTCAATGCT GTTGGTGGT ATAACAGTAACTAATAAAT TT GACTTTCCAAT ATCAATT 



CTGCCAATGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATi\AATGGA 



CTGCCAATGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAATGGA 
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Comparative Sequences relating to SAOT368 
(protein of unknown function) 

SEQ2501 : 

SEQ2502 

SEQ2503 

SEQ2504 

SEQ2505 

SEQ2506 

SEQ2507 AACAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGT 

SEQ2508 

SEQ2509 

SEQ2510 

SEQ2511 AACAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGT 

SEQ2501 TATAATTTTTCG 

SEQ2502 TATAATTTTTCG 

SEQ2503 TATAATTTTTCG 

SEQ2504 TATAATTTTTCG 

SEQ2505 TATAATTTTTCG 

SEQ2506 TATAATTTTTCG 

SEQ2507 AAAAAAGACAACGTGAAGTAATTCAAAAAGTCCTTAAAAAAATATTGGCGTTAAATAGT 

SEQ2508 TATAATTTTTCG 

SEQ2509 

SEQ2510 TATAATTTTTCG 

SEQ2511 AAAAAAGACAACGTGAAGTAATTCAAAAAGTCCT TAAAAAAATAT TGGCGTT AAAT AGT 

SEQ2501 CTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCT 

SEQ2502 CTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCT 

SEQ2503 CTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCT 

SEQ2504 CTAATGAAT TGT CTAAGACTT TTAAAGAT T TTAAGCTAGCTAAATCAAAAAGTCATGCT 

SEQ2505 CT AATGAATT GTCTAAGACTTT TAAAGATTT TAAGCTAGCTAAATCAAAAAGTCATGCT 

SEQ250 6 CTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCT 

SEQ2507 TTAGTTCAT-ACAAAAAAATTCTTTCCGCAGTAAGTAA — TAACATGCAAACTAATATT 

SEQ2508 CTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCT 

SEQ250 9 TTAGTTCAT - ACAAAAAAAT TCTT TCCGCAGTAAGTAA — TAACATGCAAACTAATATT 

SEQ2510 CTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCT 

SEQ251 1 TTAGTTCAT-ACAAAAAAATTCTTTCCGCAGTAAGTAA — TAACATGCAAACTAATATT 

SEQ2501 TTGAAGAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCAT 

SEQ2502 TTGAAGAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCAT 

SEQ2503 TTGAAGAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCAT 

SEQ2504 TTGAAGAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCAT 

SEQ2505 TTGAAGAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCAT 

SEQ2506 T T GAAG AAAC AAAG CCGTTT T CAAT ACTAT TAATGGGGGTGGACACAGGTTCAGAGCAT 

SEQ2507 AGATATCATCAAAAACGATTCCTAATTTGTTAGCTTATAAAGATTCA TTGGAACAT 

SEQ2508 TTGAAGAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCAT 

SEQ250 9 AGATATC ATCAAAAACGAT T CCTAATTTGTTAGCT TATAAAGAT TCA TTGGAACAT 

SEQ2510 TTGAAGAAACAAAGCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCAT 

SEQ2511 AGATATCATCAAAAACGAT TCCTAAT TTGTT AGCTT ATAAAGATTCA TTGGAACAT 

SEQ2501 GAAAATCTAAGT-GGTCAGGAAATAGCGATTCTATGATCTTAGTCACTATAAATCCTAA 

SEQ2502 GAAAATCTAAGTTGGTCAGGAAATAGCGATTCTATGATCTTAGTCACTATAAATCCTAA 

SEQ2503 GAAAATCTAAGT-GGTCAGGAAATAGCGATTCTATGATCTTAGTCACTATAAATCCTAA 

SEQ2504 GAAAATCTAAG T -GGTCAGGAAATAGCGATTCTATG ATCTTAGTCACTATAAATCCTAA 

SEQ2505 GAAAATCTAAGT-GGTCAGGAAATAGCGATTCTATGATCTTAGTCACTATAAATCCTAA 

SEQ2506 GAAAATCTAAGT-GGTCAGGAAATAGCGATTCTATGATCTTAGTCACTATAAATCCTAA 

SEQ2507 TTAAATCTT AT C-AGTTGAAGGGTGAAGACGCTACT CT AT CAG — ATGGTGGCTCTTAT 

SEQ2508 GAAAATCTAAGT-GGTCAGGAAATAGCGATTCTATGATCTTAGTCACTATAAATCCTAA 

SEQ2509 TTAAATCTTATC-AGTTGAAGGGTGAAGACGCTACTTTATCAG — ATGGTGGCTCTTAT 

SEQ2510 GAAAATCTAAGT-GGTCAGGAAATAGCGATTCTATGATCTTAGTCACTATAAATCCTAA 

SEQ2511 TTAAATCTTATC-AGTTGAAGGGTGAAGACGCTACTCTATCAG — ATGGTGGCTCTTAT 

SEQ2501 ACTAATAAT^ACAACGATGACAAGCTTAGAACGTGACGTA^TGATTAAATTGAGTGGTCC 
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Table 



Comparative Sequences relating t SAGU368 
(protein of unknown functi n) 




SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 



ACTAATAAAACAACGATGAO\AGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCC 
ACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCC 
ACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCC 
ACTAATAAAACAACGATGAC AAGCTTAGAACGTG ACGTAT TG AT T AAAT T GAGTGGTCC 
ACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCC 
AAAT TT TAACTAAGAAACATCT ACTTGC AGTTCAAAATAGAAT T AAGAAAGAGCTGGAT 
ACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCC 
AAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAGAAAGAACTGGAT 
ACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCC 
AAAT T TTAACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAGAAAGAGCT GGAT 

AAAAATAAT GGACAGACTGGAGT AGAAGCAAAG — CTAAATGCAGCCTATGCTTCTGGT 
AAAAATAATGGACAGACTGGCGTAGAAGCAAAG — CTAAATGCAGCCTATGCTTCTGGT 
AAAAATAATGGACAGACTGGAGTAGAAGCAAAG — CTAAATGCAGCCTATGCTTCTGGT 
AAAAATAATGGACAGACTGGAGTAGAAGCAAAG — CTAAATGCAGCCTATGCTTCTGGT 
AAAAATAATGGACAGACTGGAGTAGAAGCAAAG — CTAAATGCAGCCTATGCTTCTGGT 
AAAAATAATGGACAGACTGGAGTAGAAGCAAAG — CTAAATGCAGCCTATGCTTCTGGT 
AAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACT 
AAAAATAATGGACAGACTGGAGTAGAAGCAAAG — CTAAATGCAGCCTATGCTTCTGGT 
AAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACT 
AAAAATAATGGACAGACTGGAGTAGAAGCAAAG — CTAAATGCAGCCTATGCTTCTGGT 
AAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACT 

GTGC - GGAAATGGCAT TGATG ACTGTTCAAGACTTATTAGAT ATTAATGT TGATT ACTT 
GTGC- GGAAATGGCATTGATGACT GTT CAAGACTTAT T AGATAT TAATGTTGATTACTT 
GTGC-GGAAATGGCATTGATGACTGT TC AAGACTTAT TAGAT AT TAATGTTGATTACT T 
GTGC-GGAAATGGCATT GATGACTGTTCAAGACTTATTAGATAT TAATGTTGATTACT T 
GTGC-GGAAATGGCATTGATGACTGTTCAAGACTTATTAGAT AT TAATGTTGATTACTT 
GTGC -GGAAATGGC ATTGATGACTGTTCAAGACTT AT TAGAT ATT AATGTTGATT ACTT 
CTGCT AGTAATGATTCTTCTACT TATT CATCAAC-ACAAGAGAATTATTAT TAT - ACAA 
GTGC- GGAAATGGCATTGATGACTGTTCAAGACTTATTAGAT ATT AATGTTGATT ACTT 
CTGCT AGT AATGATTCTTCTACTTATT CATCAAC-ACAAGAGAATAATTATAAT - ACAA 
GTGC -GGAAATGGCATTGATGACTGTTCAAGACTT ATTAGAT ATT AATGTTGATTACTT 
CTGCTAGTAATGATTCTTCTACTTATTCATCAAC-ACAAGAGAATAATTATAAT-ACAA 

ATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGT 
ATGCAAATTAATATGCAAGGATT AGTT GAT TTAGTCAATGCTGTTGGTGGT ATAACAGT 
ATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGT 
ATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGT 
ATGCAAATTAATATGCAAGGAT TAGT TGAT TTAGTCAATGCTGTTGGTGGTATAACAGT 
ATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGT 

ACCCTTATTCAGAAGCACCACCAAGTTACAGTGGT - AATACTACT TAT AGTT CTGA 

ATGCAAATTAATATGCAAGGATTAGT TGAT TTAGTCAATGCTGTTGGTGGTATAACAGT 

ACC-TTATTCAGAAGCACCACCAAGTTACAGTGGT-AATACTACTTATAGTT CTGA 

ATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGT 
ACC- TT ATTC AGAAGCACCACCAAGTT ACAGTGGT - AATACTACTTATAGTT CTGA 

ACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGT 
ACTAATAAATT TGACTTTCCAATATCAAT TGCTGCCAATGAACCAGAGTACAAGGCTGT 
ACTAATAAATTTGACTTTCCAATATCAATTqCTGCCAATGAACCAGAGTACAAGGCTGT 
ACTAATAAATTTG ACTT TCCAAT ATCAAT TGCTGCCAATGAACCAGAGTACAAGGCTGT 
ACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGT 
ACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGT 

ACTAATCAAAC-AACTCATCAA AGTTACTAT-AATAG — TAGCACTCCTGCTAGT 

ACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGT 

ACTAATCAAAC-AACTCATCAA AATTACTAT-AATAG — TAGCACTCCTGCTAGT 

ACTAATAAAT TTGACTT TCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGT 
ACTAATCAAAC-AACTCATCAA AGTTACTAT-AATAG — TAGCACTCCTGCTAGT 
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Table 




SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 



GTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATGCG 
GTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATGCG 
GTTGAACCAGGGACACATAAAATAAATGG AGAACAAGCACT TGTT TATTCTCGTATGCG 
GTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTT TATTCTCGTAT GCG 
GTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATGCG 
GTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATGCG 
ACTAT AGCAGTAACAC-TAACACAGGTCAGGCT GATT CAAGTGGAAGTGTTAATAATT A 
GTTGAACCAGGGACACATA/U^ATAAATGGAGAACAAGCACTTGTTTATTCTCGTATGCG 
ACTATAGCAGTAACAC-TAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCA 
GTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATGCG 
ACTATAGCAGTAACAC- TAACACAGGTCAGGCTGATTCAAGTGGAAGTGT TAATAATT A 

TATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAA 
TATGATGAT CCAG AGGGAGAT T ATGGGCGTCAAAAAAGACAACGTGAAGT AATT CAAAA 
TATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAA 
TATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAA 
TATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAA 
TATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAA 
AACGGGGCTGCAACGCCTAATCCAAACACAGGAACGCAACCAGTACCAGGTCAAACTAA 
TATGATGAT CCAG AGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAAT TCAAAA 

AACGGGGCTGCAACGCCTAATCCA 

T ATGATGATCCAGAGGGAGATTATGGGCGT CAAAAAAGACAACGT GAAGTAAT TCAAAA 
AACGGGGCTGCAACGCCTAATCCAAACACAGGAACGCAACCAGTACCAGGTCAAACTAA 

GTCCTTAAAAAAATATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGC 
GTCCTTAAAAAAATATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGC 
GTCCTTAAAAAAATATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGC 
GTCCTTAAAAAAATATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGC 
GTCCTTAZ^AAAAATATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGC 
GTCCTTAAAAAAATATTGGCGTTAAATAGTATTAGTTCATACTVAAAT^AATTCTTTCCGC 

CCA 

GTCCTTAAAAAAATATTGGCGTTAAATAGTA 



GTCCTTAAAAAAATATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGC 
CCA 

GTAAGTAATAACATGCAAACTAATATTGAGATATCATCAAAAACGATTCCTAATTTGTT 
GTAAGTAATAACATGCAAACTAATATTGAGATATCATCAAAAACGATTCCTAATTTGTT 
GTAAGTAATAACATGCAAACTAATATTGAGATATCATCAAAAACGATTCCTAATTTGTT 
GTAAGTAATAACATGCAAACTAATATTGAGATATCATCAAAAACGATTCCTAATTTGTT 
GTAAGTAATAACATGCAAACTAATATTGAGATATCATCAAAAACGATTCCTAATTTGTT 
GT AAGTAATAACATGCAAACTAATATTGAGATATCAT CAAAAACGAT TCCTAATTTGTT 



GTAAGTAATAACATGCAAACTAATATTGAGATATCATCAAAAACGATTCCTAATTTGTT 



GCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAGGGTGAAGACGCTAC 
GCTTAT AAAGATTC AT TGGAACATATTAAATCT TATC AGTTGAAAGGTGAAGACGCTAC 
GCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAGGGTGAAGACGCTAC 
GCTTAT AAAGATTCATTGGAACAT AT TAAATCT TATCAGTT GAAGGGTGAAGACGCTAC 
GCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAGGGTGAAGACGCTAC 
GCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAGGGTGAAGACGCTAC 



GCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAGGGTGAAGACGCTAC 
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SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2506 
S6Q2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 

SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 



C mparative Sequences relating to SAG0368 
(protein f unknown functi n) 

TTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAA 
TTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAA 
TTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAA 
TTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAA7UV 
TTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAA 
TTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAA 



TTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAA 



AGAATTAAGAAAGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCT 
AGAATTAAGAAAGAACTAGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCT 
AGAATTAAGAAAGAACTGG ATAAAAAGCGTAGTAAAACT CTGAAGACAAGCGCGAT TCT 
AGAATTAAGAAAGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCT 
AGAATTAAGAAAGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCT 
AGAATTAAGAAAGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCT 



AGAATTAAGAAAGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCT 



TATGAAGATTACTATGGTACTACTGCTAGTAATGATTCTTCTACTTATTCATCAACACA 
TATGAAGATTACTATGGTACTACTGCTAGTAATGATTCTTCTACTTATTCATCAACACA 
TATGAAGATTACTATGGTACTACTGCTAGTAATGATTCTTCTACTTATTCATCAACACA 
TATGAAGATTACTATGGTACTACTGCTAGTAATGATTCTTCTACTTATTCATCAACACA 
TATGAAGATT AC T ATGGTACTACT GCTAGTAATGATTCTTCTACT TATTCATCAACACA 
TATGAAGATTACTATGGTACTACTGCTAGTAATGATTCTTCTACTTATTCATCAACACA 



TATGAAGATT ACTATGGTACTACTGCTAGTAATGATTCTTCT ACT TAT TCATCAACACA 



GAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATAC 
GAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATAC 
GAGAAT AAT TATAAT ACAACACCT T ATTCAGAAGCACCACCAAGTTACAGTGGTAAT AC 
GAGAATAAT T ATAATAC AACACCT TATTCAGAAGCACCACCAAGTTACAGT GGTAATAC 
GAGAATAATTATAATACAAC^CCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATAC 
GAGAATAATT AT AAT ACAACACCT TATTCAGAAGCACCACCAAGT TACAGTGGTAAT AC 



GAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATAC 



ACTTAT-AGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTC 
ACT TAT -AGTTCTGAGACTAATC AAACAACT CATCAAAGTTACTATAAT AGTAGCACTC 
ACT TAT — AGTTCTGAG ACTAATCAAACAACTCATC AAAATT ACTATAATAGTAGCACTC 
ACTTAT-AGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTC 
ACTTAT-AGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTC 
ACTTATTAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTC 



ACTTAT-AGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTC 
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SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 



Table 25: Comparative Sequences relating t SAGO; 
(protein of unknown functi n) 



^G0368 



TGCTAGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCA 
TGCTAATAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCA 
TGCTAGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCA 
TGCTAGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCA 
TGCTAGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCA 
TGCTAGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTC7UVGTGGAAGTGTCA 



TGCTAGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCA 



SEQ2501 
SEQ2502 
SEQ2503 
SEQ2504 
SEQ2505 
SEQ2506 
SEQ2507 
SEQ2508 
SEQ2509 
SEQ2510 
SEQ2511 



TAATCATAACGGGGCTGCAACGCCTAATCCA 
TAATCAT AATGGGGCTGCAACG CCTAATCCA 
TAATCATAACGGGGCTGCAACGCCTAATCCA 
TAATCATAACGGGGCTGCAACGCCTAATCCA 
TAATCATAACGGGGCTGCAACGCCTAATCCA 
TAATCATAACGGGGCTGCAACGCCTAATCCA 



TAATCATAACGGGGCTGCAACGCCTAATCCA 



>SEQ ID NO 2550: 54_090 frame; 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTmSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DY FMQINMQGLVDLVNAVGGIT VTNKFDFPI S IAANE PEYKAWE PGTHKI NGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKILALNSISSYKKILSAVSNNMQTNIEISSKTIP 
NLIAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

>SEQ ID NO 2551:54_1169NT frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKLVRK . RFYDLSH 
YKS . N . - NNDDKLRT . RID . IEWSQK . WTDWRRSKAKCSLCFWWCGNGIDDCSRLIRY . C 
. LLYAN . YARIS . FSQCCWWYNSN . . I. LSNINCCQ. TRVQGCC . TRDT . NKWRTSTCLF 
SYAL . . SRGRLWASKKTT . SNSKSP . KNIGVK - Y . FIQKNSFRSK . .HAN.Y.DIIKNDS 
. FVSL - RFIGTY . ILSVER . RRYFIRWWLLSNFN . ETSTCSSK . N . ERTR . KA . . NSEDK 
RDSI . RLLWYYC. . • FFYLFINTRE . L . YNTLFRSTTKLQW. YYL . F. D . SNNSSKLL. . 
.HSC. .L.Q.H.HRSG.FKWKCQ.S.WGCNA.S 

>SEQ ID NO 2552:54_18RS21 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMOGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVI QKVLKKI LALNS I S S YKKILSAVSNNMQTN IEI S SKTI P 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYSSNTNTGQADS SGS VNNHNGAAT PNP 

>SEQ ID NO 2553 :54_2 603 frame: 1 

YNFSTNELSKTFKDFBCLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMTUjMTVQDLLDINV 
DYmQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAVVEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKI LALNS IS SYKKILSAVSNNMQTNIEISSKTIP 
NLIAYKDSLEHIKSYQI^GEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
ST P ASN Y S SNTN TGQADS SG S VNNHNGAAT PNP 

>SEQ ID NO 2554: 54_A909 frame: 1 

YNFSTNELSKTFKDFKIAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAVVEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKILALNSISS YKKILSAVSNNMQTN IEISSKTIP 
NLIAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
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Table 25T Comparativ Sequences relating to SAG0368 
(pr tein f unknown function) 



AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 



>SEQ ID NO 2555:54jCJB110 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
raPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKILALNSISSYKKIltSAVSNNMQTNIEISSKTIP 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLIAVQNRIKKELDKKRSKTLKTS 
AILYE DY YGTTASNDS ST YS STQENN YNTT PYSE AP PS Y SGNTT Y . F. D. SNNSSKLL . • 

>SEQ ID NO 2556:54_COHl frame: 1 

DFKLDKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVTXNPKTNKTTMTSL 
ERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINVDYFMQINMQGLVD 
LVN AVGGITVTNKFDFPI S I AANE PE YKAWE PGTHK INGEQALVYSRMRYDDPEGDYGR 
QKRQREVIQKVLKKILALNSISSYKKILSAVSNNMQTNIEISSKTIPNLLAYKDSLEHIK 
SYQLKGEDATLSDGGSYQILTKKHLIAVQNRIKKELDKKRSKTLKTSAILYEDYYGTTAS 
NDSSTYSSTQENYYYTTPLFRSTTKLQW . YYL • F. D. SNNSSKLL . . . HSC. . L.Q.H. H 
RSG • FKWKC . . L . RGCNA . SKHRNATSTRSN . S 

>SEQ ID NO 2557:54_H36B frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAVVEPGTHKINGEQALVYS 
R^YDDPEGDYGRQKRQREVIQKVLKKILALNSISSYKKILSAVSNXsfMQTNIEISSKTIP 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLIAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYS SNTNTGQADS SG SVNNHNGAATPN P 

>SEQ ID NO 2558:54_jJM9130013 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFD FPI S I AANE PE YKAWE PGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKILALNSISSYKKILSAVSNNMQTNIEISSKTIP 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYS SNTNTGQADS SGSVNNHNGAATPNP 

>SEQ ID NO 2559:54_M781 frame: 2 

SILLMGVDTGSEHRKSKWSGNSDSMILVTINPKTNKTTMTSLERDVLIKLSGPKNNGQTG 
VEAKLNAAYASGGAEMALMTVQDLLDINVDYFMQINMQGLVDLVNAVGGITVTNKFDFPI 
SIAANEPEYKAWEPGTHKINGEQALVYSRMRYDDPEGDYGRQKRQREVIQKVLJOCILAL 
NSISSYKKILSAVSNNMQTNIEISSKTIPNLLAYKDSLEHIKSYQLKGEDATLSDGGSYQ 
I LTKKHLLAVQNRXKKEL DKKRSKTLKTS AILYE DYYGTTASN DS ST YS STQENNYNTT P 
YSEAPPS YSGNTT YSSETNQTTHQS YYNS ST PASNYS SNTNTGQADS SG SVNNYNGAAT P 
NPNTGTQPVPGQTNP 



SEQ2550 NFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 

SEQ2551 NFSTNELSKTFKDFKIAKSKSHAIEETKPFSILLMGVDTGSEHRKSKLVRKRFYDLSHY 

SEQ2552 NFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 

SEQ2553 NFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 

SEQ2554 NFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 

SEQ2555 NFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 

SEQ2556 DFKLDKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 

SEQ2557 NFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 

SEQ2558 NFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 

SEQ2559 SILLMGVDTGSEHRKSKWSGNSDSMILVT 

SEQ2550 NPKTNKTTMT SLERDVL IKL S GPKNNGQTGVEAKLNAAYAS GGAEMALMTVQ DLLDI N V 

SEQ2551 SNNNDDKLRTRIDIEWSQKWTDWRRS KAKCSLCFWWCGNGIDDCSRLIRYCLLY 

S6Q2552 NPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMAI^VQDLLDINV 

SEQ2553 NPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 

SEQ2554 NPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 

SEQ2555 N PKTNKTTMT SLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 

• SEQ2556 N PKTNKTTMT SLERD VL IKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 

SPQ2557 • NPKTNKTTMTSI^RDVLIKLSGPK^GQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
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A 6£«06» a 08E60E 

Table is^Tomparative Sequences relating t SAGu^ft 
(protein of unknown function) 

SEQ2558 NPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDI13V 

SEQ2559 NPKTNKTTMTSLERDVLIKLSGPK^GQTGVEAKLNAAYASGGAEMAI^VQDLLDINV 

SBQ2550 Y FMQINMQGLVDLVNAVGG ITVTNKFDFPI S IAANEPE YKAWE PGTHKINGEQALVYS 

SEQ2551 NYARISFSQCCWWYNS NILSNINCCQTRVQGCCTRDTNKWRTSTCLFSY 

SEQ2552 YFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYS 

SEQ2553 YFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYS 

SEQ2554 YFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYS 

SEQ2555 YFMQINMQGLVDLVNAVGGITVTNKFDFPI S IAANE PEYKAWE PGTHKI NGEQALVYS 

SEQ2556 YFMQINMQGLVDLVNAVGGITVTNKFDFPI S IAANE PEYKAWE PGTHKINGEQALVYS 

SEQ2557 YFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYS 

SEQ2558 YFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYS 

SEQ2559 YFMQINMQGLVDLVNAVGGITVTNKFDFPI S IAANE PEYKAWE PGTHKINGEQALVYS 

SEQ2550 MRYDDPEGDYGRQKRQREVIQKVLKKILALNS I SSYKKILSAVSNNMQTNIEI SSKT I P 

SEQ2551 LSRGRLWASKKTTSNSKSPKNIGVKYFIQKNSFRSKHANYDIIKNDSFVSLRFIGTYI- 

SEQ2552 MRYDDPEGDYGRQKRQREVIQKVLKKILAIiNSISSYKKILSAVSNNMQTNIEISSKTIP 

SEQ2553 MRYDDPEGDYGRQKRQREVIQKVLKKILALNS I SSYKKILSAVSNNMQTNIEISSKTIP 

SEQ2554 MRYDDPEG DYGRQKRQREVIQKVLKKILALNSI S S YKKI LS AVSNNMQTNIE ISSKTI P 

SEQ2555 MRYDDPEG DYGRQKRQREVIQKVLKKILALNS I S S YKKI LS AVSNNMQTN IE I SSKT I P 

SEQ2556 MRYDDPEG DYGRQKRQREVI QKVLKKI LALNSI S S YKKI L S AVSNNMQTN IE ISSECT I P 

SEQ2557 MRYDDPEGDYGRQKRQREVIQKVLKKI LALNSI SSYKKILSAVSNNMQTNIEISSKTIP 

SEQ2558 MRYDDPEGDYGRQKRQREVIQKVLKKILALNSISSYKKILSAVSNNMQTNIEISSKTIP 

SEQ2S59 MRYDDPEGDYGRQKRQREVIQKVLKKI LALNS I S S YKKI LS AVSNNMQTNIE I S SKT I P 

SEQ2550 LLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 

SEQ2551 L-SVERRRYFIRWWLLSNFNETSTCSSKNERTRKANSEDKRDSIRLLWYYCFFYLFINT 

SEQ2552 LLAYKD SLEHI KS YQLKGEDATLS DGGS YQILTKKHLLAVQNRI KKELDKKRSKT LKTS 

SEQ2553 IXAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 

SEQ2554 LLAYKDSLEHIKS YQLKGEDAT LS DGGS YQI LTKKHLLAVQNRI KKELDKKRSKT LKTS 

SEQ2555 LLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 

SEQ2556 LLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 

SEQ2557 LLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 

SEQ2558 LLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIBCKELDKBCRSKTLKTS 

SEQ2559 LLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRI KKELDKKRSKT LKTS 

SEQ2550 ILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 

SEQ255X ELY NTLFRST TKLQW YYLFD SNNS S KLLHS CLQH 

SEQ2552 ILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 

SEQ2553 ILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 

SEQ2554 ILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 

SEQ2555 ILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYFDSNNSSKLL 

SEQ2556 ILYEDYYGTTASNDSSTYSSTQENYYYTTPLFRSTTKLQWYYLFDSNNSSKLLHSCLQH 

SEQ2557 ILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 

SEQ2558 ILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 

SEQ2559 ILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQSYYNS 

SEQ2550 TPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

SEQ2551 RSGFKWKCQSWGCNAS 

8EQ2552 TPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

SEQ2553 TPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

SEQ2554 TPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

SEQ2555 

SEQ2556 RSGFKWKCLRGCNASKHRNATSTRSNS ' 

SEQ2557 T PASNYS SNTNTGQADSSGS VNNHNG AAT PN P 

SEQ2558 T PASNYS SNTNTGQADSSGS VNNHNGAAT PNP 

SEQ2559 TPASNYSSNTNTGQADSSGSVNNYNGAATPNPNTGTQPVPGQTNP 
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s ^ jnces relating to SAG0503 (lipase/acylhyffiase) 

S^^D NO. 2601: SAG0503 FROM THE 090 GBS TYPE la STRAIN 
(REVERSE COMPLEMENT) 

GGGCACAAGT TTGTACAAAAAAGCAGGCTCT ATTTTTTCCTTGATCAT T CCAAAATCAAATCCTAAATT AACAAAAAA 
AGACTTCCTAACAAAGAAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATAC 
AACCTCTCAAGGTGGTTTTGTCCCACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAA 
TTATGGTGTGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGA 
GAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATC 
ACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACGTTTGAAAGAAATACTTGCAAAAGCAAGACAAGATAA 
TCCTAAAT TGCCTATTTATGT TTT AGGC ATTTATAATCCTTTTTACCTAAACTT TCCACAATTAACTAAAATGCAAAC 
CGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTTGTCCCAATTAATGA 
CCGCCTTTATAAGGGAATAAATGGTAAAGAGGGTATTACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGC 
TCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAA 
TGAAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAAG 



SEQ ID NO. 2602: SAG0503 FROM THE H36b GBS TYPE lb STRAIN 
(REVERSE COMPLEMENT) 

TTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCCT 
AACAAAGAAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCA 
AGGTGGTTTTGTTCCACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTGT 
GTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAGCTGA 
TTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTC 
CTTTGAGAAACCAGC^GAAGCATATAAGGAACGTTTGAAAGAAATCCTTGCT^AAAGCAAGACAAGATAATCCTAAATT 
GCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATTAACTAAAATGCAAACCGTTATTGA 
TAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTTGTCCCAATTAATGACCGCCTTTA 
TAAGGGAATAAATGGTAAAGAGGGTATTATAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTTAC 
TGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAATGAAACAAG 
AAAAAACTGGCCGAACCCAGCTTTCTTGTACAAAGTGGTCC 

\ 

SEQ ID NO. 2603: SAG0503 FROM THE 18RS21 GBS TYPE II STRAIN (REVERSE 
COMPLEMENT ) 

GTTTGTA(^AAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCC 
TAACAAAGA7VAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTC 
AAGGTGGTTTTGTTCCACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTG 
TGTCTGGGAATACTAGTCAACAAATT T TAAAACGT ATG ACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAG CTG 
ATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATT 
CCTT TGAGAAACCAGCAGAAGCATATAAGGAACGTTTGAAAGAAATCCT TGCAAAAGCAAGACAAGATAATCCTAAAT 
TGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATT^ACTAAAATGCAAACCGTTATTG 
ATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTTGTCCCAATTAATGACCGCCTTT 
ATAAGGGAATAAATGGTAAAGAGGGTATTACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTTA 
CTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAATGAAACAA 
GAAAAAACTGGCCGAACCCAGCTTTCTTG TACAA 



SEQ ID NO. 2604: SAG0503 FROM THE COH1 GBS TYPE III STRAIN (REVERSE 
COMPLEMENT) 

GGACAAGTTTGTACAAAA7\AGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAG 
ACTTCCTAACAAAGAAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGGGATACAA 
CCTCTCAAGGTGGTTTTGTCCCACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTT/^ATT 
ATGGTGTGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGA 
AAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCAC 
TAAATTCCT TTGAGAAACCAGCAGAAGCATATAAGGAACGT TTGAAAGAAATTCT TGCAAT^AGCAAGACAAGATAATC 
CTAAATTGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATTAACTAAAATGCAAACCG 
TTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTTGTCCCAATTAATGACC 
GCCTTTATAAGGGAATAAATGGTAAAGAGGGTATTACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTC 
TCT T TACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTAT GGAGAAAATAAATG 
AAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAA 
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uences relating t SAG0503 (lipase/acylh^I tase) 

SE^^D NO. 2605: SAG0503 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE 
COMPLEMENT) 

GTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAi\ATTAACAAAAAAAGACTTCC 
TAACAAAGAAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTC 
AAGGTGGTTTTGTCCCACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTG 
TGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAGCTG 
ATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATT 
CCTTTGAGAAACCAGCAGAAGCATATAAGGAACGTTTGAAAGAAATACTTGCAAAAGCAAGACAAGATAATCCTAAAT 
TGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATT7VACTAAAATGCAAACCGTTATTG 
ATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTTGTCCCAATTAATGACCGCCTTT 
ATAAGGGAATAAATGGTAAAGAGGGTATTACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTTA 
CTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATT^AATGAAACAA 
GAAAAAACTGGCCGAACCCAGCTTTCTTGTACAA 

SEQ ID NO. 2606: SAG0503 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE 
COMPLEMENT) 

GTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCC 
TAACAAAGAAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGGGATACAACCTCTC 
AAGGTGGTTTTGTCCCACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTG 
TGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAGCTG 
ATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGCTGTTATTCGTA7UVGAGCTCAGTCATTTATCACTAAATT 
CCTTTGAGAAACCAGCAGAAGCAT ATAAGGAACGTTTGAAAGAAATT CTTGCAAAAG CAAGACAAGATAATCCTAAAT 
TGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATTAACTAAAATGCAAACCGTTATTG 
ATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTTGTCCCAATTAATGACCGCCTTT 
ATAAGGGAATAAATGGT7UVAGAGGGTATTACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTTA 
CTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAATGAAACAA 
GAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAA 

SEQ ID NO. 2607: SAG0503 FROM THE JM9130013 GBS TYPE VIII STRAIN 
(REVERSE COMPLEMENT) 

GTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCC 
TAACAAAGAAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTC 
AAGGTGGTTTTGTTCCACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTG 
TGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAGCTG 
ATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATT 
CCTTTGAGAAACCAGCAGAAGCATATAAGGAACGTTTGAAAGA2VATCCTTGCAAAAGCAAGACAAGATAATCCTAAAT 
TGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTA7VACTTTCCACAATTAACTAAAATGCAAACCGTTATTG 
ATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTTGTCCCAATTAATGACCGCCTTT 
A'fAAGGGAATAAATGGTAAAGAGGGT ATTACAGAGTCATCAAATAGTC AGGCAAG TATCACTAATGATGCTCTCT T T A 
CTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAATGAAACAA 
GAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAA 

SEQ ID NO. 2608: SAG0503 FROM THE 2603 V/R GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

AGTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAT^AATC^AATCCTAAATTAACAAAAAAAGACTTC 
CTAACA7VAGAAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCT 
CAAGGTGGTTTTGTTCCACTGCTATCAGAATCACTCCATT^ATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGT 
GTGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAGCT 
GATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGCTGTTATTCGTT^AAGAGCTCAGTCATTTATCACTAAAT 
TCCTTTGAGAAACCAGCAGAAGCATATAAGGAACGTTTGAAAGAAATCCTTGCAAAAGCAAGACAAGATAATCCTAAA 
TTGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATTAACTAAAATGCAAACCGTTATT 
GATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTTGTCCCAATTAATGACCGCCTT 
TATAAGGGAATAAATGGTAAAGAGGGTATTACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTT 
ACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAATGAAACA 
AGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAAGTGG 
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T^fe26: C mparative^!|uences relating t SAG0503 (lipase/acylh^folase) 



Ihyaolj 



NO, 2609: SAG0503 FROM THE M781 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

GGACAAGTTTGTACAAAAAAGCAGGCICTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAG 
ACTTCCTAACAAAGAAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGGGATACAA 
CCTCTCAAGGTGGTTTTGTCCCACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATT 
ATGGTGTGTCTGGGAATACTAGTCAACAAATTT TAAAACGTATGACGACAGATCCTCAAAT CGAAAAAGATT TAGAGA 
AAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCAC 
TAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACGTTTGAAAGAAATTCTTGCAAAAGCAAGACAAGATAATC 
CTAAATTGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATTAACTAAAATGCAAACCG 
TTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTTGTCCCAATTAATGACC 
GCCTTTATAAGGGAATAAATGGTAAAGAGGGTATTACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTC 
TCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAATG 
AAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAA 



SEQ2601 
SEQ2602 
SEQ2603 
SEQ2604 
SEQ2605 
SEQ2606 
SEQ2607 
SEQ2608 
SEQ2609 

SEQ2601 
SEQ2602 
SEQ2603 
SEQ2604 
SEQ2605 
SEQ2606 
SEQ2607 
SEQ2608 
SEQ2609 

SEQ2601 
SEQ2602 
SEQ2603 
SEQ2604 
SEQ2605 
SEQ2606 
SEQ2607 
SEQ2608 
SEQ2609 

SEQ2601 
SEQ2602 
SEQ2603 
SEQ2604 
SEQ2605 
SEQ2606 
SEQ2607 
SEQ2608 
SEQ2609 

SEQ2601 
SEQ2602 
SEQ2603 
SEQ2604 
SEQ2605 
SEQ2606 
SEQ2607 
SEQ2608 
SEQ2609 

SEQ2601 
SEQ2602 
SEQ2603 



GGCACAAGTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAA 

TTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAA 

GTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAA 

GGACAAGTTTGTACAAAAAAGGAGGCTCTATTTTTTCCTTGATCATTCCAAAATC^AA 

GTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAA 

GTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAA 

GTTTGTACAAAA/VAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAA 

AGTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAA 

GGACAAGTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAA 

TCCTAAATTAACAAAAAAAGACTTCCTAACAAAGAAAGTTATCCCACTTAACTATGTTGC 
TCCTAAATTAACAAAAAAAGACTTCCTAACAAAGAAAGTTATCCCACTTAACTATGTTGC 
TCCTAAATTAACAAAAAAAGACTTCCTAACAAAGAAAGTTATCCCACTTAACTATGTTGC 
TCCTAAATTAACAAAAAAAGACTTCCTAACAAAGAAAGTTATCCCACTTAACTATGTTGC 
TCCTAAATTAACAAAAAAAGACTTCCTAACAAAGAAAGTTATCCCACTTAACTATGTTGC 
TCCTAAATTAACAAAAAAAGACTTCCTAACAAAGAAAGTTATCCCACTTAACTATGTTGC 
TCCTAAATTAACAAAAAAAGACTTCCTAACAAAGAAAGTTATCCCACTTAACTATGTTGC 
TCCTAAATTAACAAAAAAAGACTTCCTAACAAAGAAAGTTATCCCACTTAACTATGTTGC 
TCCTAAATTAACAAAAAAAGACTTCCTAACAAAGAAAGTTATCCCACTTAACTATGTTGC 

TCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTCCC 
TCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTTCC 
TCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTTCC 
TCTTGGAGATTCTCTGACCGAAGGTGTGGGGGATACAACCTCTCAAGGTGGTTTTGTCCC 
TCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTCCC 
TCTTGGAGATTCTCTGACCGAAGGTGTGGGGGATACAACCTCTCAAGGTGGTTTTGTCCC 
TCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTTCC 
TCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTTCC 
TCTTGGAGATTCTCTGACCGAAGGTGTGGGGGATACAACCTCTCAAGGTGGTTTTGTCCC 

ACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGG 
ACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGG 
ACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGG 
ACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGG 
ACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGG 
ACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGG 
ACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGG 
ACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGG 
ACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGG 

TGTGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGA 
TGTGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGA 
TGTGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGA 
TGTGTCH'GGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGA 
TGTGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGA 
TGTGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGA 
TGTGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGA 
TGTGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGA 
TGTGTCTGGGAATACTAGTCAACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGA 

AAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTJRATGATGTCTTGGC 
A7VAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGC 
AAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGC 
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SEQ2605 
SEQ2606 
SEQ2607 
SEQ2608 
SEQ2609 

SEQ2601 
SEQ2602 
SEQ2603 
SEQ2604 
SEQ2605 
SEQ2606 
SEQ2607 
SEQ2608 
SEQ2609 

SEQ2601 
SEQ2602 
SEQ2603 
SEQ2604 
SEQ2605 
SEQ2606 
SEQ2607 
SEQ260B 
SEQ2609 

SEQ2601 
SEQ2602 
SEQ2603 
SEQ2604 
SEQ2605 
SEQ2606 
SEQ2607 
SEQ2608 
SEQ2S09 

SEQ2601 
SEQ2602 
SEQ2603 
SEQ2604 
SEQ2605 
SEQ2606 
SEQ2607 
SEQ2608 
SEQ2609 

SEQ2601 
SEQ2602 
SEQ2603 
SEQ2604 
SEQ2605 
SEQ2606 
SEQ2607 
SEQ2608 
SEQ2609 

SEQ2601 
SEQ2602 
SEQ2603 
SEQ2604 
SEQ2605 
SEQ2606 
SEQ2607 
SEQ2608 
SEQ2609 

SEQ2601 
SEQ2602 
SEQ2603 ' 



AAAAGATTTAGAGAAAGCTGATTTATTGATOCTAACTGTTGGTGGTAATGATGTCTTGGC 
AAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGC 
AAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGC 
AAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGC 
AAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGC 
AAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGATGTCTTGGC 

TGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGC 
TGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGC 
TGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGC 
TGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGC 
TGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGC 
TGTTATTCGTAAAGAGCTGAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGC 
TGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGC 
TGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGC 
TGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGC 

ATATAAGGAACGTTTGAAAGAAATACTTGCAAAAGCAAGACAAGATAATCCTAAATTGCC 
ATATAAGGAACGTTTGAAAGAAATCCTTGCAAAAGCAAGACAAGATAATCCTAAATTGCC 
ATATAAGGAACGTTTGAAAGAAATCCTTGCAAAAGCAAGACAAGATAATCCTAAATTGCC 
ATATAAGGAACGTTTG7\AAGAAATTCTTGCAAAAGCAAGACAAGATAATCCTAAATTGCC 
ATATAAGGAACGTTTGAAAGAAATACTTGCAAAAGCAAGACAAGATAATCCTAAATTGCC 
ATATAAGGAACGTTTGAAAGAAATTCTTGCAAAAGCT^GACAAGATAATCCTAAATTGCC 
ATATAAGGAACGTTTGAAAGAAATCCTTGCAAAAGCAAGACAAGATAATCCTAAATTGCC 
ATATAAGGAACGTTTGAAAGAAATCCTTGCAAAAGCAAGACAAGATAATCCTAAATTGCC 
ATATAAGGAACGTTTGAAAGAAATTCTTGCAAAAGCAAGACAAGATAATCCTAAATTGCC 

TATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATTAACTAAAAT 
TATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATTAACTAAAAT 
TATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATTAACTAAAAT 
T ATTT ATGTTTTAGGCATTTATAATCCTTTTT AC CTAAACTTT CCAC AATT AACTAAAAT 
TATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATTAACTAAAAT 
TATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATTAACTAAAAT 
TATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATT AACTAAAAT 
TATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATTAACTAAAAT 
TATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCCACAATT AACTAAAAT 

GCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAA 
GCAAACCGOTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAA 
GCAAACCGTTATTGATAATTGGAATAAAGCTAC^AAAGAAGTAGTTGATGCTTCAGAAAA 
GCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAA 
GCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAA 
GCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAA 
GCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAA 
GCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAA 
GCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAA 

TGTTTATTTTGTCCCAATTAATGACCGCCTTTATAAGGGAATAAATGGTAAAGAGGGTAT 
TGTTTATTTTGTCCCAATTAATGACCGCCTTTATAAGGGAATAAATGGTAAAGAGGGTAT 
TGTTTATTTTGTCCCAATTAATGACCGCCTTTATAAGGGAATAAATGGTAAAGAGGGTAT 
TGTTTATTTTGTCCCAATTAATGACCGCCTTTATAAGGGAATAAATGGTAAAGAGGGTAT 
TGTTTATTTTGTCCCAATTAATGACCGCCTTTATAAGGGAATAAATGGTAAAGAGGGTAT 
TGTTTATTTTGTCCCAATTAATGACCGCCTTTATAAGGGAATAAATGGTAAAGAGGGTAT 
TGTTTATTTTGTCCCAATTAATGACCGCCTTTATAAGGGAATAAATGGTAAAGAGGGTAT 
TGTTTATTTTGTCCCAATTAATGACCGCCTTTATAAGGGAATAAATGGTAAAGAGGGTAT 
TGTTTATTTTGTCCCAATTAATGACCGCCTTTATAAGGGAATAAATGGTAAAGAGGGTAT 

TACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTTACTGGAGACCA 
TATAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTTACTGGAGACCA 
TAC^GAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTTACTGGAGACCA 
TACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTTACTGGAGACCA 
TACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTTACTGGAGACCA 
TACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTTACTGGAGACCA 
TACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTTACTGGAGACCA 
TACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTTACTGGAGACCA 
TACAGAGTCATCAAATAGTCAGGCAAGTATCACTAATGATGCTCTCTTTACTGGAGACCA 

TTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAA 
TTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAA 
TTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAA 
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SeUR* 

SEQ2605 
SEQ2606 
SEQ2607 
SEQ2608 
SEQ2609 



H 



26: Comparative Sequences relating to SAG0503 Qipase/acylhycroiase) 



lhSmas 



TTTTCATCCCAAT/UVTATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAA 
TTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAA 
TTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAA 
TTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAA 
TTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAA 
TTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAA 



SEQ2 601 TGAAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAAG 

SEQ2602 TGAAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAAGTGGTCC 

SEQ2 603 TGAAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAA 

SEQ2604 TGAAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAA 

SEQ2605 TGAAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAA 

SEQ2 60 6 TGAAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAA 

SEQ2 607 TGAAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGT ACAAA 

SEQ2608 TGAAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAAGTGG 

SEQ2609 TGAAACAAGAAAAAACTGGCCX3AACCCAGCTTTCTTGTACAAATABCMARATVSTNCSRA 



SEQ2601 

SEQ2602 

SEQ2603 

SEQ2604 

SEQ2605 

SEQ2606 

SEQ2607 

SEQ2608 

SEQ2609 NGTSAGASACYHYDAS 



>SEQ ID NO 2650:103_090 frame: 2 

I FS LI I PKSN PKLTKKDFLTKKVI PLNYVALG DSLTEGVGDTTSQGG FVP 

LLSESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLA 

VIRKELSHLSLNSFEKPAEAYKERLKEIIAKARQDNPKLPIYVLGIYNPFYLNFPQLTKM 

QTVIDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDH 

FHPNNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2651-.I03JB36B frame: 2 

IFSLIIPKSNPKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVPLLS 
ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 
KELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 
I DNWNKATKEWDASENVYFVPINDRLYKGINGKEGI IES SNSQASITNDALFTGDHFHP 
NNIGYQIMSNAVMEKINETRKNWP 

>SEQ ZD NO 2652:103_18RS21 frame: 3 

I FSLI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGG FVPLL S 

ESLHNRYSYQVTSVNYGVSGNTSQQIIxKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 

KELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 

IDNWNKATKEVVDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHP 

NNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2653:103JCOH1 frame: 3 

IFSLIIPKSNPKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVPL 
LSESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAV 
IRKELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQ 
TV I DNWNKATKE WDASEN V YFVPINDRL YKG INGKEG ITE S SNSQAS ITN D ALFTGDH F 
HPNNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2654:103_CJB110 frame: 3 

I FSLI I PKSNPKLTKKDFLTKKVI PLN YVALGDSLTEGVG DTTSQGGFVPLLS 

ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 

KELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 

IDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHP 

NNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2655:103_1169NT frame: 3 

I FSLI I PKSNPKLTKKDFLTBCKVI PLN YVALGDSLTEGVG DTTSQGGFVPLLS 
ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 
KELSHLSLNS FEKPAEAYKERLKE I LAKARQDN PKLP I YVLG I YN PFYIjNFPQLTKMQTV 
IDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHP 
NNIGYQIMSNAVMEKINETRKNWP 
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26: Comparative 



ences relating to SAG0503 (lipase/acyllr 



se) 



>^PtD NO 2656:103_JM9130013 frame.: 3 
IFSLIIPKSNPKLTKKDFLTKKVIPLNYVAI^DSLTEGVGDTTSQGGFVPLLS 
ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADIJiTLTVGGNDVLAVIR 
KELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 
IDNWNKATKEVVDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHP 
NNIGYQIMSNAVMEKINETRKNWP 
>SEQ ID NO 2657:103_2603 frame: 1 

IFSLIIPKSNPKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVPLL 

SE S LHNRYSYQVTS VN YGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGN DVLAVI 

RKE LSHL S LN S FEKPAEAYKERLKE I LAKARQDN PKL PI YVLGI YNP FYLNFPQLTKMQT 

VIDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFH 

PNNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2658:103_M781 frame: 3 

IFSLIIPKSNPKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVPL 
LSESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAV 
IRKELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPI YVLGI YNPFYLNFPQLTKMQ 
TVI DNWNKATKEVVDASENVYFVPINDRLYKGINGKEGITES SNSQASITNDALFTGDHF 
HPNNIGYQIMSNAVMEKINETRKNWP 



SEQ2650 
SEQ2651 
SEQ2652 
SEQ2.653 
SEQ2654 
SEQ2655 
SEQ2656 
SEQ2657 
SEQ2658 

SEQ2650 
SEQ2651 
SEQ2652 
SEQ2653 
SEQ2654 
SEQ2655 
SEQ2656 
SEQ2657 
SEQ2658 

SEQ2650 
SEQ2651 
SEQ2652 
SEQ2653 
SEQ2654 
SEQ2655 
SEQ2656 
SEQ2657 
SEQ2658 

SEQ2650 
SEQ2651 
SEQ2652 
SEQ2653 
SEQ2654 
SEQ2655 
SEQ2656 
SEQ2657 
SEQ2658 

SEQ2650 
SEQ2651 
SEQ2652 
SEQ2653 
SEQ2654 
SEQ2655 
SEQ2656 
SEQ2657 
SEQ2658 



I FSLI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPLLSESLHNRY 
IFSLIIPKSNPKLTKKDFIiTKKVIPLNYVALGDSLTEGVGDTTSQGGFVPLLSESLHNRY 
I FSLI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGG FVPLLSESLHNRY 
I FSLI I PKSNPKLTKKDFLTKKVI PLN YVALGDSLTEGVGDTTSQGGFVPLLSESLHNRY 
IFSLIIPKSNPKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVPLLSESLHNRY 
IFSLIIPKSNPKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVPLLSESLHNRY 
I FSLI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPLLSESLHNRY 
I FSLI I PKSN PKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPLLSESLHNRY 
I FSLI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPLLSESLHNRY 

SYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIRKELSHLS 
SYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIRKELSHLS 
SYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIRKELSHLS 
SYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIRKELSHLS 
SYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIRKELSHLS 
SYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIRKELSHLS 
SYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIRKELSHLS 
SYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIRKELSHLS 
SYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIRKELSHLS 

LNSFEKPAEAYKERLKEILAKARQDNPECLPIYVLGIYNPFYLNFPQLTKMQTVIDNWNKA 
LNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTVIDNWNKA 
LNSFEKPAEAYKERLKEILAKARQDNPKLPI YVLG I YN P FYLNFPQLTKMQTVX DNWNKA 
LNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTVIDNWNKA 
LN S FEKPAEAYKERLKE I LAKARQDN PKLPI YVLGI YNPFYLN FPQLTKMQT VI DNWNKA 
LNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTVIDNWNKA 
LNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTVIDNWNKA 
LNS FEKPAEAYKERLKEILAKARQDN PKLPIYVIjGI YNP FYLNFPQLTKMQT V I DNWNKA 
LNS FEKPAEAYKERLKEILAKARQDN PKLP I YVLGI YNP FYLN FPQLTKMQT VI DNWNKA 

TKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHPNNIGYQI 
TKEWDASENVYFVPINDRLYKGINGKEGIIESSNSQASITNDALFTGDHFHPNNIGYQI 
TKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHPNNIGYQI 
TKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHPNNIGYQI 
TKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHPNNIGYQI 
TKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHPNNIGYQI 
TKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHPNNIGYQI 
TKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHPNNIGYQI 
TKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHPNNIGYQI 

MSNAVMEKINETRKNWP 
MSNAVMEKINETRKNWP 
MSNAVMEKINETRKNWP 
MSNAVMEKINETRKNWP 
MSNAVMEKINETRKNWP 
MSNAVMEKINETRKNWP 
MSNAVMEKINETRKNWP 
MSNAVMEKINETRKNWP 
MSNAVMEKINETRKNWP 
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6 OHX3-&S^^ n08£60 2 

Tabl 27: C mparative Sequences relating t SAG1473 
(cell wall surface anchor family protein) 

SEQ ID NO. 2701: SAG1473 FROM THE 1169NT1 GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

G AT ACAAGT G AT AAGAAT ACTGAC ACGAGT GTCGTGACTACG ACC TT ATCT G AGG AGAAAAGATC AGATGA 
ACT AGACCAGT CT AGT ACT GGTTCT T CTTCT GAAAAT GAATCG AGT T C ATC AAGT GAACC AGAAAC AAATC 
CGTCAACTAATCCACCTACAACAGAACCATCGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGA 
ACGAAGAC AG AAAT T GGC AAT AATAAGGAT ATTT CTAGTG GAACAAAAGT AT TAAT TTC AGAAGATAGT AT 
TAAGAATTTTAGT AAAGC AAGTAGTGATCAAGAAGAAGTGGATCGCGATGAATC AT C ATCTT CAAAAGC AA 
GTGATGGGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

SEQ ID NO. 2702: SAG1473 FROM THE 18RS21 GBS TYPE II STRAIN 

GAT ACAAGTGATAAGAAT ACTGACACGAGTGTCGT GACTACGACCT T ATCTG AGGAGAAAAGATC AGAT G A 
ACT AGACCAGT CTAGTACTGGTTCTTCTTCTGAAAATGAATCGAGTTCATCAAGTGAACC AGAAAC AAATC 
CGTCAACTAATCCACCTACAACAGAACCATCGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGA 
ACGAAGAC AGAAATTGGCAATAATAAGGATATTTCTAGTGGAACAAAAGTATTAAT TT C AGAAGATAGT AT 
TAAGAATTTTAGTAAAGCAAGTAGTGATCAAGAAGAAGTGGATCGCGATGAATCATCATCTTCAAAAGCAA 
ATGATGGGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

SEQ ID NO. 2703: SAG1473 FROM THE 2603 V/R GBS TYPE V STRAIN 

GATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAAAGATCAGATGA 
ACTAGACCAGTCT AGT ACT GGTTCTTCTTCTGAAAAT GAATCG AGTT CATC AAGTGAACC AGAAAC AAATC 
CGTCAACT AAT CC ACCTAC AACAGAACCAT CGCAACCCTC ACCT AGTGAAG AGAACAAGCCT GATGGT AG A 
ACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACAAAAGTATTAATTTCAGAAGATAGTAT 
TAAGAATT TT AGT AAAGCAAGTAGT GATCAAGAAGAAGTGGAT CGCG ATGAATCATC ATCT TC AAAAGCAA 
ATGAT GGGAAAAAAGGCC AC AGTAAGCCT AAAAAGGAA 

SEQ ID NO. 2704: SAG1473 FROM THE 090 GBS TYPE la STRAIN 

GACCAGTCT AGT ACTGGT TCT TCTT CTGAAAAT GAAT CG AGTTCAT CAAGTGAAC C AG AAACAAATC CGTC 
AACTAATCCACCTACAACAGAACCATCGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGAACGA 
AGACAGAAATTGGCAATT^ATAAGGATATTTCTAGTGGAACAAAAGTATTiXATTTCAGAAGATAGTATTAAG 
AAT TTTAGTAAAGC AAGT AGTGATC AAGAAGAAGT GGATCGCGAT GAATCAT CAT CTTCAAAAGC AAAT GA 
T GGGAAAAAAGGCC AC AGTAAGCCT AAAAAGGAA 

SEQ ID NO. 2705: SAG1473 FROM THE A909 GBS TYPE la STRAIN > 

GATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAAAGATTAGATGA 

ACT AG ACCAGTCT AGT AC TGGTTCTTCTTCTGAAAATGAATCGAGTTCATCAAGTGAACCAG AAACAAATC 

CCTCAACTAATCCACCTACAACAGAACCATCGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGC 

ACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACAAAAGTATTAATTTCAGAAGATAGTAT 

TAAGAATTTTAGT AAAGC AAGT AGT GATC AAGAAGAAGTGGATCGC GATGAAT CATC AT CTT C AAAAGCAA 

ATGATGAGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

SEQ ID NO. 2706: SAG1473 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

G ATACAAGT G AT AAGAAT ACT GACACG AGTGT CGTG ACT ACGACCTT ATCTGAGGAG AAAAGATCAGATG A 

ACT AGACCAGTCT AGT ACT GGTTCTTCTTCTGAAAATGAATCG AGT TCATCAAGTGAACC AGAAAC AAATC 

CGTCAACTAATCCACCTACAACAGAACCATCGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGA 

ACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACAAAAGTATTAATTTCAGAAGATAGTAT 

TAAGAATTTTAGTAAAGC AAGT AGT GAT C AAGAAGAAGTGGATCGC G ATGAATCATC ATCT T C AAAAGCAA 

ATGATGGGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 
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Table 




Comparative Sequences relating t SA< 
(cell wall surface anch r family protein) 




SEQ ID NO. 2707: SAG1473 FROM THE COH1 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

GAT AC AAGTGATAAGAATACTGAC ACGAGTGT CGTGACT ACGACCTTATCT GAGG AGAAAAG AT C AGAT G A 
ACTAGACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATCAAGTTCATCAAGTGAACCAGAAACAAATC 
CCTC AACT AAT CCACCT AC AAC AGAACC ATCG CAACCCTCACCT AGT G AAGAGAACAAGCCTGAT GGGAGC 
ACGAAG ACAG AAAT TGGCAATAATAAGG AT ATTTCT AGTGGAAC AAAAGTATT AATTT CAGAAGAT AGT AT 
TAAGAATTTT AGT AAAGC AAGT AGT GATCAAGAAGAAGT GGAACGCG AT GAAT CATC ATCTT CAAAAGCAA 
ATGATGAGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

SEQ ID NO. 2708: SAG1473 FROM THE H36b GBS TYPE lb STRAIN 

GATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAAAGATTAGATGA 
ACTAGACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATCGAGTTCATCiAAGTGAACCAGAAACAAATC 
CCTCAACTAATCCACCTACAAGAGAACCATCGCAACCCTCACCTAGTGAAGAGAAC2\AGCCTGATGGTAGC 
ACGAAGACAG AAATTGGCAATAATAAGGATAT TT CT AGTGG AACAAAAGT ATTAATT TC AGAAGAT AGT AT 
TAAGAAT TTTAGT AAAGCAAGT AGT GATC AAG AAr AAGTGGATCGCGATG AATC ATCAT CTTCAAAAGCAA 
ATGATGAGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

SEQ ID NO. 2709: SAG1473 FROM THE JM910013 GBS TYPE VIII STRAIN 
GATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAAAGATTAGATGA 
ACTAGACCAGTCT AGT ACT GGTTCTTCTTCT GAAAAT G AATCGAGT TC AT CAAGT GAACC AGAAACAAAT C 
CCTCAACTAAT CCACCTAC AAC AGAACCATCGC AACCCT CACCT AGT GAAGAGAACAAGC CTGATGGT AGC 
ACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACAAAAGTATTAATTTCAGAAGATAGTAT 
TAAGAATTTT AGTAAAGCAAGTAGTGATCAAGAAGAAGTGGATCGC GATG AATC ATCATCTT CAAAAGCAA 
ATGAT G AGAAAAAAGGCC AC AGTAAGCCTAAAAAGG AA 

SEQ ID NO. 2710: SAG1473 FROM THE M732 GBS TYPE III STRAIN 

GAT ACAAGTGATAAGAAT AC TGAC ACG AGT GTCGTG ACT ACGACCTT ATCTGAGGAGAAAAGATC AGAT G A 
ACTAGACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATC2^AGTTCATCAAGTGAACCAGAAACAAATC 
CCTCAACTAAT CC ACCTACAACAGAACC ATCGCAACC C TCACC T AGTGAAG AG AACAAGCCTGAT GGG AGC 
ACGAAGAC AGAAAT TG GC AAT AAT AAGGATAT TTCT AGT GGAACAAAAGTATTAATTTC AGAAGAT AGT AT 
TAAGAAT TTTAGT AAAGC AAGT AGTGAT CAAG AAGAAGTGGAAC GCGATGAATCATCATCT T CAAAAGCAA 
ATGATGAGAAAAAAGGCCAC AGT AAGC CTAAAAAGGAA 

SEQ ID NO. 2711: SAG1473 FROM THE M781 GBS TYPE III STRAIN 

GAT AC AAGTGATAAGAAT AC TGAC ACGAGTGT CGTGACTACGACCT TATCTGAGGAGAAAAGATCAGATGA 
ACT AGACC AGT CT AGT ACTGGTT CTTCTT CTGAAAAT GAATCAAGTTC AT CAAGTGAACCAGAAAC AAAT C 
CCTCAACTAATCCACCTACAACAGAACCATCGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGGAGC 
ACGAAGACAGAAATTGGCAAT AAT AAGGATAT TTCTAGTGG AACAAAAGT ATTAATT T C AG AAGAT AGTAT 
TAAGAAT T TT AGTAAAG CAAGT AGT GATC AAGAAGAAGTGGATCGCGATGAATC ATCAT CTTCAAAAGCAA 
ATGATGAGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 



SEQ2701 ATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAA 
SEQ2702 ATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAA 
SEQ2703 ATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAA 



SEQ2704 
SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 



ATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAA 
ATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAA 
ATACAAGTGATAAGAATACT GACACG AGTGTCGTGACT ACGACCT TATCTGAGGAGAAA 
ATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAA 
ATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAA 
ATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAA 
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• 



Ha. £3*4-^ *r 



Table TfTC mparative Sequences relating to SAG 
(cell wall surface anchor family protein) 



SEQ2701 
SEQ2702 
SEQ2703 

SEQ2704 

GACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATCGAGTTCA 



GATCAGATGAACTAGACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATCGAGTTCA 
GATCAGATGAACTAGACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATCGAGTTCA 
GATCAGATGAACTAGACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATCGAGTTCA 



SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 

SEQ2701 
SEQ2702 
SEQ2703 
SEQ2704 
SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 

SEQ2701 
SEQ2702 
SEQ2703 
SEQ2704 
SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 

SEQ2701 
SEQ2702 
SEQ2703 
SEQ2704 
SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 

SEQ2701 
SEQ2702 
SEQ2703 
SEQ2704 
SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 



GATTAGATGAACTAGACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATCGAGTTCA 
GATCAGATGAACTAGACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATCGAGTTCA 
GATCAGATGAACTAGACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATCAAGTTCA 
GATTAGATGAACTAGACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATCGAGTTCA 
GATCAGATGAACTAGACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATCAAGTTCA 
GATCAGATGAACTAGACCAGTCTAGTACTGGTTCTTCTTCTGAAAATGAATCAAGTTCA 

TCAAGTGAACCAGAAACAAATCCGTCAACTAATCCACCTACAACAGAACCATCGCAACCC 
TCAAGTGAACCAGAAACAAATCCGTCAACTAATCCACCTACAACAGAACCATCGCAACCC 
TCAAGTGAACCAG7\AACAAATCCGTCAACTAATCCACCTACAACAGAACCATCGCAACCC 
TCAAGTGAACCAGAAACAAATCCGTCAACTAATCCACCTACAACAGAACCATCGCAACCC 
TCAAGTGAACCAGAAACAAATCCCTCAACTAATCCACCTACAACAGAACCATCGCAACCC 
TCAAGTGAACCAGAAACAAATCCGTCAACTAATCCACCTACAACAGAACCATCGCAACCC 
TCAAGTGAACCAGAAACAAATCCCTCAACTAATCCACCTACAACAGAACCATCGCAACCC 
TCAAGTGAACCAGAAACAAATCCCTCAACTAATCCACCTACAACAGAACCATCGCAACCC 
TCAAGTGAACCAGAAACAAATCCCTCAACTAATCCACCTACAACAGAACCATCGCAACCC 
TCAAGTGAACCAGAAACAAATCCCTCAACTAATCCACCTACAACAGAACCATCGCAACCC 

TCACCTAGTGAAGAGAACAAGCCTGATGGTAGAACGAAGACAGAAATTGGCAATAATAAG 
TCACCTAGTGAAGAGAACAAGCCTGATGGTAGAACGAAGACAGAAATTGGCAATAATAAG 
TCACCTAGTGAAGAGAACAAGCCTGATGGTAGAACGAAGACAGAAATTGGCAATAATAAG 
TCACCTAGTGAAGAGAACAAGCCTGATGGTAGAACGAAGACAGAAATTGGCAATAATAAG 
TCACCTAGTGAAGAGAAC7UVGCCTGATGGTAGCACGAAGACAGAAATTGGCAATAATAAG 
TCACCTAGTGAAGAGAACAAGCCTGATGGTAGAACGAAGACAGAAATTGGCAATAATAAG 
TCACCTAGTGAAGAGAACAAGCCTGATGGGAGCACGAAGACAGAAATTGGCAATAATAAG 
TCACCTAGTGAAGAGAACAAGCCTGATGGTAGCACGAAGACAGAAATTGGCAATAATAAG 
TCACCTAGTGAAGAGAACAAGCCTGATGGGAGCACGAAGACAGAAATTGGCAATAATAAG 
TCACCTAGTGAAGAGAACAAGCCT GATGGGAGCACGAAGACAGAAAT TGGCAATAATAAG 

GATAT TTCT AGTGGAAC AAAAGTATTAAT T TCAGAAGATAGTATTAAGAATTT T AGT AAA 
GATATTTCTAGTGGAACAAAAGTATTAATTTCAGAAGATAGTATTAAGAATTTTAGTAAA 
GATATTTCTAGTGGAACAAAAGTATTAATT TCAGAAGAT AGTAT TAAGAATTTTAGTAAA 
GATATTTCTAGTGGAACAAAAGTATTAATTTCAGAAGATAGTATTAAGAATTTTAGTAAA 
GATATTTCTAGTGGAACAAAAGTATTAATTTCAGAAGATAGTATTAAGAATTTTAGTAAA 
GATATTTCTAGTGGAACAAAAGTATTAATTTCAGAAGATAGTATTAAGAATTTTAGTAAA 
GATATTTCTAGTGGAACAAAAGTATTAATTTCAGAAGATAGTATTAAGAATTTTAGTAAA 
GATATTTCTAGT GGAACAAAAGTATTAATTT CAGAAGAT AGTATTAAGAATTTTAGTAAA 
GATATTTCTAGTGGAACAAAAGTATTAATTTCAGAAGATAGTATTAAGAATTTTAGTAAA 
GATATTTCTAGTGGAACAAAAGTATTAATTTCAGAAGAT AGTATTAAGAATTTTAGTAAA 

GCAAGTAGTGATCAAGAAGAAGTGGATCGCGATGAATCATCATCTTCAAAAGCAAGTGAT 
GCAAGTAGTGATCAAGAAGAAGTGGATCGCGATGAATCATCATCTTCAAAAGCAAATGAT 
GCAAGTAGTGATCAAGAAGAAGTGGATCGCGATGAATCATCATCTTCAAAAGCAAATGAT 
GCAAGTAGTGATCAAGAAGAAGTGGATCGCGATGAATCATCATCTTCAAAAGCAAATGAT 
GCAAGTAGTGATCAAGAAGAAGTGGATCGCGATGAATCATCATCTTCAAAAGCAAATGAT 
GCAAGT AGTGATCAAGAAGAAGTGGAT CGCGATGAATCATCAT CTTCAAAAGCAAATGAT 
GCAAGTAGTGATCAAGAAGAAGTGGAACGCGATGAATCATCATCTTCAAAAGCAAATGAT 
GCAAGTAGTGATCAAGAAGAAGTGGATCGCGATGAATCATCATCTTCAAAAGCAAATGAT 
GCAAGTAGTGATCAAGAAGAAGTGGAACGCGATGAATCATCATCTTCAAAAGCAAATGAT 
GCAAGTAGTGATCAAGAAGAAGTGGATCGCGATGAATCATCATCTTCAAAAGCAAATGAT 
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SEQ2701 
SEQ2702 
SEQ2703 
SEQ2704 
SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 

SEQ2701 
SEQ2702 
SEQ2703 
SEQ2704 
SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 

SEQ2701 
SEQ2702 
SEQ2703 
SEQ2704 
SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 

SEQ2701 
SEQ2702 
SEQ2703 
SEQ2704 
SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 

SEQ2701 
SEQ2702 
SEQ2703 
SEQ2704 
SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 




Table 27f~Comparative Sequences relating t SAG1^73 
(cell wall surface anchor family protein) 

GGGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

GGGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

GGGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

GGGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

GAGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

GGGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

GAGAAAAAAGGCCACAGTAAGCCTAAAAAGGAASGATACAAGTGATAAGAATACTGACAC 

GAGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

GAGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

GAGAAAAAAGGCCACAGTAAGCCXAAAAAGGAATABCMARATVSTNCSRATNGTSAGCWA 



AGTGTCGTGACTACGACCTTATCTGAGGAGAAAAGATTAGATGAACTAGACCAGTCTAG 



TRACANCHRAMYRTN- 



ACTGGTTCTTCTTCTGAAAATGAATCGAGTTCATCAAGTGAACCAGAAACAAATCCCTC 



ACTAATCCACCTACAACAGAACCATCGCAACCCTCACCTAGTGAAGAGAACAAGCCTGA 



GGTAGCACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACAAAAGTATT 
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Table 



f, 



C mparative Sequences relating to SAC? 
(cell wall surface anchor family protein) 



SEQ2701 
SEQ2702 
SEQ2703 
SEQ2704 
SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 



ATTTCAGAAGATAGTATTAAGAATTTTAGTAAAGCAAGTAGTGATCAAGAARAAGTGGA 



SEQ2701 
SEQ2702 
SEQ2703 
SEQ2704 
SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 



CGCGATGAATCATCATCTTCAAAAGCAAATGATGAGAAAAAAGGCCACAGTAAGCCTAA 



SEQ2701 
SEQ2702 
SEQ2703 
SEQ2704 
SEQ2705 
SEQ2706 
SEQ2707 
SEQ2709 
SEQ2710 
SEQ2711 



AAGGAA 



>SEQ ID NO 2750:4_1169NT frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKASD 
GKKGHSKPKKE 

>SEQ ID NO 2751:4 18RS21 frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
GKKGHSKPKKE 

>SEQ ID NO 2752:4_2603 frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
GKKGHSKPKKE 

>SEQ ID NO 2753:4_090 frame: X 

DQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQPSPSEENKPDGRTKTEIGNNKDISSG 
TKVLISEDSIKNFSKASSDQEEVDRDESSSSKANDGKKGHSKPKKE 

>SEQ ID NO 2754:4_A909 frame: 1 

DTSDKNTDTSWTTTLSEEKRLDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
EKKGHSKPKKE 

>SEQ ID NO 2755:4jCJB110 frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
GKKGHSKPKKE 
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Comparative Sequences r lating to SA(3l473 
(cell wall surface anch r family protein) 

>SEQ ZD NO 2756:4_COHl frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVERDESSSSKAND 
EKKGHSKPKKE 

>SEQ ID NO 2757:4_H36B frame: 1 

DTSDKNTDTSWTTTLSEEKRLDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEXVDRDESSSSKAND 
EKKGHSKPKKE 

>SEQ ID NO 2758:4_JM9130013 frame: 1 

DTSDKNTDTSWTTTLSEEKRLDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
EKKGHSKPKKE 

>SEQ ID NO 2759:4_M732 frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVERDESSSSKAND 
EKKGHSKPKKE 

>SEQ ID NO 2760:4_H781 frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
EKKGHSKPKKE 



SEQ2750 
SEQ2751 
SEQ2752 
SEQ2753 
SEQ2754 
SEQ2755 
SEQ2756 
SEQ2757 
SEQ2758 
SEQ2759 
SEQ2760 

SEQ2750 
SEQ2751 
SEQ2752 
SEQ2753 
SEQ2754 
SEQ2755 
SEQ2756 
SEQ2757 
SEQ2758 
SEQ2759 
SEQ2760 

SEQ2750 
SEQ2751 
SEQ2752 
SEQ2753 
SEQ2754 
SEQ2755 
SEQ2756 
SEQ2757 
SEQ2758 
SEQ2759 
SEQ2760 



TSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
TSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETOPSTNPPTTEPSQP 
TSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTOPPTTEPSQP 

DQSSTGSSSENESSSSSEPETOPSTNPPTTEPSQP 

TSDKNTDTSWTTTLSEEKRLDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
TSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
TSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
TSDKNTDTSWTTTLSEEKRLDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
TSDKNTDTSWTTTLSEEKRLDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
TSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
TSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 

SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKASD 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
SPSEENKPDGSTKTEIGNNKDISSGTKVI.ISEDSIKNFSKASSDQEEVDRDESSSSKAND 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVERDESSSSKAND 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEXVDRDESSSSKAND 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVERDESSSSKAND 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 

KKGHSKPKKE 
KKGHSKPKKE 
KKGHSKPKKE 
KKGHSKPKKE 
KKGHSKPKKE 
KKGHSKPKKE 
KKGHSKPKKE 
KKGHSKPKKE 
KKGHSKPKKE 
KKGHSKPKKE 
KKGHSKPKKE 
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C mparative Sequences relating to SA^R5 



p 7 " u Q-eo^c^ 

Tabled: C mparative Sequences relating to S^T f552 
(conserved hypothetical protein) 



SEQ ID NO. 2801: SAG1552 FROM THE 1169NT1 GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

TTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTTAAAGGAGTAGACGTTGAGTCT 

TCCTTAGCAGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAGTGGTTCCATTTAATTTCCAAC 

ATGGGGGCAAATACTGTAAGAGTCAAAGTACCGATGAATGTTGCATTTTACGATGCTTTATATCACCACAACAAAGCA 

TCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTAAT 

GATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAAT 

ACTGATTTTGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAAT 

AGTGGTACTGTCGCTTATACTAATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGCGGCA 

GCTAATCCATTTGAGGTCATGCTAGCTCAAGTTATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAA 

CATTTGATTAGTTTTTCAAACTCACCAACAACAGACCCTTTTCGTTATCGAAAACCATTTGAGGCACAGGCTCCTAAA 

TACGTACAACTAAATGTAGAAAATATTCAAGCTAATTCGAATGTTAAAGCAGGTATTTTTGCAGCATATAAAGCTATT 

GATTTCCATCCTCGATACAAGGATTATCTATTATTTGATAAAGAC^TATCAGTAAAGAAGATAGA 

GAACTTTCTTTGTCACAGGGATACGTTAAACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGC 

* A ;jI GGACAGC ^^ 

» G I^ ACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTT 

AATGCAAGGGCGTGGAATACATCCTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAAT 
A^CATCC?C?G™^ 

SEQ ID NO. 2802: SAG1552 FROM THE 

ATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCT 

S^ AAAA !iZ A ^ 

TATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGT 
AGTAATTTTGAGCAGATCAATATGGTATTGAGAAATAC^ 

A ^I^™ AACT ^ TCCTACTGGTCTTCTCAAAACAG 

ACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCAGTTGTTGAATTTTTCTGATCCATCATCT 
^ AAAAAATTCACGATGATT ^^ 

GCTAATAGCAAAGAAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACCAAAACC 
TTTTTAAAAGACTCCTATTATAGTATTTAAGAAAGAA ^WM»AfaACCC6ATACCAAAACC 

SEQ ID NO. 2803: SAG1552 FROM THE 18RS21 GBS TYPE II STRAIN 
™ G ,^ A I TAAAAGA ^ 

CCTTTTGTTGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAA 
ACGTATCGTGAATGGTTCCATTTAATTTCC^CATGGGGGCAAATACTGTAAGAGTCAAGGTACCGATGAATGTTGCA 
TTTTACGATGCCTTATATCACCACAACAAAGCATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCT 

G ™ T CTCCATGGGCGTAAGCAAGTA TGGAATACTGATTTGGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGG 

^ AC nr^ TTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCTT ^^^ 

In™™ G ^ TATTTTAAAACTTCTGTGGCAGCTAaTC 

ACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAACAGACCCTTTTCAT 

» A I^™ CCATTTGAGGCACAGGCTCCTAAATACGTACAACTA ^^ 

^ TG P TGCAGCATATAAAGCTATTGAT TTCCATCCTCGATACAAGGATTATCTATTATTTC»TAAAGAG 
AA ™* CAG J AAAGAAGATA ^^ 

GAGAAA ^^ GCG 3 GTTCTAGTCACGGGTTATGGC TATTCGA(^GCGAGAGGTATTGC 

~™~ C CGATTAATGAAAAAGAACAA GGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTT 
GGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGT 
CAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTATTAGGCTTTAAAAACGCAAAACATCATTAT 
^ G ^^ A ^ G I™ GAGGCAAAGGAGAGTGGAAACATCCTC ^^^ 

AGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACGATTATTACCAATA 
GATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTTTGTATTG 

fH AACGG J AAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAG 
^ A n^ A r GTTGAAGACATGGAAAAAGTAAAAG ^ 

A^CAGGAACAACTGATAGGCACCAAAAAACATTTGATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAG 
GTCAGAATTCCGTGGCAGTTGTTGAATTTTTCT<^TCCATCATCTCAAAAAATTCACGATGAT^CT 

GCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACCAAAACCTTTTTAAAAGACTCCTATTATGTATTAAGAAA 

■*Vr* 
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C mparative Sequences relating to SA^TC52 
(conserved hypothetical protein) 

SEQ ID NO. 2804: SAG1552 FROM THE 2603 V/R GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

TATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTG 
TTGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATC 
GTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGAGTCAAGGTACCGATGAATGTTGCATTTTACG 
ATGCCTTATATCACCACAACAAAGCATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCA 
ATAATGCTTCTATAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTC 
TCCATGGGCGTAAGCAAGTATGGAATACTGATTTGGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTG 
GTTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCTTATACTAATCATCAAGAGAAAAAAACGCAATATAAAG 
GACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGATGAATTGACACATT 
ATGAGACAGCTAAATATGGTTGGC2\ACATTTGATTAGTTTTTCAAACTCACCAACAACAGACCCTTTTCATTATCGAA 
AACCATTTGAGGCACAGGCTCCTAAATACGTACAACTAAATGTAGAAAATATTCAAGCTAATTCAAATGTTAAAGCAG 
GTATGTT TGCAGCATATAAAGC T ATTGATTTCC ATCCTCGATACAAGGAT TATCTATTATTTGAT AAAGAGAATATCA 
GTA7^GAAGATAGACAAAAGATTAAAGAACTTTCTTTGTC^CAGGGATACGTTAAACTGCT7U^TGCTTATCACAAAA 
TCCCTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGC 
CGATTAATGAAAAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGA 
CTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGTCAATTCC 
TATGGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTATTAGGCTTTA7^AAACGCAAAACATCATTATCAAGTTG 
ATGGTAJ^AAGAGGCAAAGGAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAGTG 
ATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACGATTATTACCAATAGATATTA 
CACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTG 
ATCGAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACG 
GTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGTATTGAGAAATACAA 
AGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTACCAACTCATCCTACTGGTCTTCTCAAAACAG 
GAACAACTGATAGGCACCAAAAAACATTTGATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAA 
TTCCGTGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTATGGTGTGA 
AGGAGTTAGAAATTGAGAGCATTGCTTTAGGATTAGGTGCTAATAGCAAAGAAAACACACTGATAAAGATGGCAGATT 
ATCGTTTGAAAAATTGGGAGAGACCCGATACCAAAACCTTTTTAAAAGACTCCTATTATAGTATTAAGAAAGAATGGT 
CTAAAGAAAGAGAGAGAACATATGGTCCA 

SEQ ID NO. 2805: SAG1552 FROM THE A909 GBS TYPE la STRAIN 
(REVERSE COMPLEMENT) 

AAGGGCTTATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAA 
CCTTTTGTTGTTAAAGGAGTAGAC(STTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACT(^AA^ 
ACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGAGTCAAGGTACCGATGAATGTTGCA 
TTTTACGATGCCTTATATCACCACAACAAAGCATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCT 
TATCGCAATAATGCTTCTATAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTG 
GATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTGGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGG 
GTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCTTATACTAATCATCAAGAGAAAAAAACGCAA 
TATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGATGAATTG 
ACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAACAGACCCTTTTCAT 
TATCGAATVACCATTTGAGGCACAGGCTCCTAAATACGTACAACTAAATGTAGAAAATATTCAAGCTAATTCAAATGTT 
AAAGCAGGTATGT T TGCAGCATATAAAGCTATTGATTTCCATCCTCGAT ACAAGGAT TATCTATTAT T TGATAAAGAG 
AATATCAGTAAAGAAGATAGACAAAAGATTAAAGAACTTTCTTTGTCACAGGGATACGTTAAACTGCTAAATGCTTAT 
CACAAAATCCCTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAT^AAAGAAATTGATAAACGT 
CCTCTGCCGATTAATGAAAAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTT 
GGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGT 
CAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTATTAGGCTTTAAAAACGCAAAACATCATTAT 
CAAGTTGATGGTAAAAGAGGCAAAGGAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCT 
AGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACGATTATTACCAATA 
GATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTTTGTATTG 
TCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCi\AGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAG 
CTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGTATTGAGA 
AATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTACCAACTCAT CCTACTGGTCT TCTC 
AAAACAGGAACAACTGATAGGCACCAAAAAACATTTGATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAG 
GTCAGAATTCCGTGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAGAATTCACGATGATTACTTTAAACATTAT 
GGTGTGAAGGAGTTAGAAAATTGAGAGCCATTGCTTTAGGATTAGGTGCTAATAGCAAAGAAAACACACTGATAAAGA 
TGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACCAAAACCTTTTTAAAAGA 
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Table 




C mparative Sequences relating to SA< 
(conserved hypoth tical protein) 




v . w Q §3; £2 <rs 22 



SEQ ID NO. 2806: SAG1552 FROM THE CJB110 GBS NONTYPEABLE STRAIN 



TATTACTTTGATGGTAGTTTGTATTTACCAAAGGGCTTATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGT 

GATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTAT 

CATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACT 

GTAAGAGTCAAGGT ACCGATGAATGTTGCAT T TTACGATGCCT TATATCACCACAACAAAGCATC AAAGAGGCCACTG 

TATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTAATGATAATTATAGGGGG 

TATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACAGATTTTGGTAGC 

CGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCT 

TATACTAATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAG 

GTCATGCTAGCTCAAGTAATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTT 

TCAAACTCACCAACAACAGACCCTTTTCATTATCGAAAACCATTTGAGGCACAGGCTCCTAAATACGTACAACTAAAT 

GT AGAAAATAT T CAAGCTAATTCAAATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCT ATTGAT TTCCATCCTCG A 

T ACAAGG AT TATCT ATT ATTTGATAAAG AGAATATC AGT AAAGAAGATAGACAAAAGATTAAAGAACTTTCTT T GTC A 

CAGGGATACGTTAAACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGA 

GGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGAAAAAGAACAAGGTCAGCGTTTACTAGAAGAT 

TATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGG 

AATACATCTTTCGCCACAAATAAACATAATCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTA 

TTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAAAGGAGAGTGGAAACATCCTCTGATG 

ACTAGTGCAACAGGAGATGACT T ATATGCTAGCAGTGATGAAAGCTATCTCTACCT TGCGAT TAAAACAAAACCTGAA 

AAACTAAAAGAA7\AACGATTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTC 

ACAT TTTCTAAATCTAGTGACT TTGT AT TGTCT ATTGATCCAAATGGCAAGTCTGAAT TAT TTGTCCAAGAGCGCT AT 

AATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGT 

AATTTTGAGC2\GATAAATATGGTATTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGG 

T TCT TACCAACTCATCCTACTGGTCT TCTCAAAACAGGAACAACTGATAGGCACCAAAAAACAT TTGAT TCACAAAC A 

GATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAA 

AAAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGTTAGAAATTGAGAGCATTGCTTTAGGATTAGGTGCT 

AATAGCAAAGAAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACCAAAACCTTT 

TTAAAAGACTCCTAT TATGTATTAAGAAAGA 

SEQ ID NO. 2807: SAG1552 FROM THE COH1 GBS TYPE III STRAIN 

TTTACCACAGGGCTTATTAAAAGAAAAT ACAAGAACTAACTT TGT TGTTAAAGGTGATACT GTACTTC ACAAGCCCAC 
CAATA7^ACCTTTTGTTGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTAC 
TCAAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGAGTCAAGGTACCGATGAA 
TGTTGCATTTTACGATGCCTTATATCACCACAACAAAGAATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTAT 
AGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGG 
CGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTTGGTAGCCGTCATTATCATTATGATCTTAG 
TCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCTTATACTAATCATCAAGAGAAAAA 
AACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGA 
TGAATTGACAC ATT ATGAGACAGCTAAAT AT GGTTGGCAACATTTG AT TAGTTT T TCAAACTCACCAACAACAGACCC 
TTTTCATTATCGAAAACCATTTGAGGCACAGGCTCCTAAATACGTACAACTAAATGTAGAAAATATTCAAGCTAATTC 
AAATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGA 
TAAAGAGAATATCAGTAZVAGAAGATAGACAAAAGATTAAAGAACTTTCTTTGTCACAGGGATACGTTAAACTGCTAAA 
TGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGA 
TAAACGTCCTCTGCCGATTAATGAAAAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGG 
TAGTTTTGGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAA 
ACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTATTAGGCTTTAAAAACGCAAAACA 
TCATTATCAAGTTGATGGTAAAAGAGGCAAAGGAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTT 
ATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACA7\AACCTGAAAAACTAAAAGAAAAACGATTATT 
ACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTT 
TGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCT 
TCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGT 
ATTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTACCAACTCATCCTACTGG 
TCTTCTCAAAACAGGAACAACT GAT AGGCACCAAAAAACATTTGATTCACAACCAGATATTTCGTT TGGAAAGGACTT 
TATAGAGGTCAGAATTCCGTGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTT^A 
ACATTATGGTGTGAAGGAGTTAGAAATTGAGAGCATTGCTTTAGGATTAGGTGCTAATAGCAAAGAAAACACACTGAT 
AAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACCAAAACCTTTTTAAAAGACT 
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Table 28: Comparative Sequences relating to SAGl§52 
(conserved hypothetical pr tein) 

SEQ ID NO. 2808: SAG1552 FROM THE H36b GBS TYPE lb STRAIN 

AAGGGGCTTATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAA 

ACCTTTTGTTGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAA 

AACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGAGTCAAGGTACCGATGAATGTTGC 

ATTTTACGATGCCTTATATCACCACAACAAAGCATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTC 

T TATCGC AATAAT GCTTCTATAACAGCT TTT AATGATAATT ATAGGGGGT ATTTAAAACGAGAAGCAAAAGGCGTTGT 

GGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTTGGTAGCAGTCATTATCATTATGATCTTAGTCCTTG 

GGTACTTGGTTATGTCGTAGGGGATGATGGACATAGTGGTACTGTCGCTTTATACTAATCATCAAGAGGAGAAAAACG 

CAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGATGAA 

TTGACACATTATGAGACAGCT7\AATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAACAGACCCTTTT 

CATTATCGAAAACCATTTGAGGCACAGGCTCCTAAATACGTACAACTAAATGTAGAAAATATTCAAGCTAATTCGAAT 

GTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATAAA 

GAGAATATCAGTAAAGAAGATAGACAAAAGATTAAAGAACTTTCTTTGTCACAGGGATACGTTAAACTGCTAAATGCT 

TATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTACTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAA 

CGTCCTCTGCCGATTAATGAAAAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGT 

TTTGGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGTGTGGAATACATCCTTCGCCACAAATAAACAT 

AGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTATTAGGCTTTAAAAACGCAAAACATCAT 

TATCAGGTTGATGGTAAAAGAGGCAAAGAAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATAT 

GCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACGATTATTACCA 

ATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTTTGTA 

TTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAACGCCTTAAAAGCGAACTATCTTCGA 

CAGCTTAATGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGTATTG 

AGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTACCAACTCATCCTACTGGTCTT 

CTCAAAACAGGAACAACTGATAGGCACCAAAAAACATTTGATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATA 

GAGGTC AGAAT TCCGTGGCAGTTGTTGAAT T TTTCTGATCCATCATCTCAAAAAAT TCACGATGATTACTT TAAACAT 

TATGGTGTGAAGGAGTTAGAAATTGAGAGCATTGCTTTAGGATTAGGTGCTAATAGCAAAGAAAACACACTGATAAAG 

ATGGCAGATTATCGTTTGA7VAAATTGGGAGAGACCCGATACCAAAACCTTTTTAAAAGACTCCTATTATAGT 



SEQ ID NO. 2809: SAG1552 FROM THE JM9130013 GBS TYPE VIII STRAIN 

ACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTTAAAGGAGTAGACGTTGAGT 

CTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGTTCCATTTAATTTCCA 

ACATGGGGGCAAAT ACTGTAAGAGTCAAGGT ACCGATGAAT GTTGCATT T TACGATGCCTTAT ATCACCACAACAAAG 

CATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTA 

ATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGA 

ATACTGATTTTGGTAGCAGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGA 

ATAGTGGTACTGTCGCTTATACTAATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGG 

CAGCT AATCCAT TTGAGGTCATGCT AGCT CAAGTAATGGATGAATT GACACAT TATGAGAC AGCTAAATATGGTT GGC 

AACATTTGATTAGTTTTTCAAACTCACCAACAACAGACCCTTTTCATTATCGAAAACCATTTGAGGCACAGGCTCCTA 

AATACGTACAACTAAATGTAGAAAATATTCAAGCTAATTCGAATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTA 

TTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATAAAGAGAATATCAGTAAAGAAGATAGACAAAAGATTA 

AAGAACTTTCTTTGTCACAGGGATACGTTAAACTGCTAAATGCTTATCACAftAATCCCTGTTCTAGTCACGGGTTATG 

GCTACTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGAAAAAGAACAAGGTC 

AGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCATGGCAAGACGATT 

GGAATGCAAGGGTGTGGAATACATCCTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTA 

ATCAAGGTTATGGTTTATTAGGCTTTAAAAACGCAT^AACATCATTATCAGGTTGATGGTAAAAGAGGCAAAGAAGAGT 

GGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGA 

TTAAAACAAAACCTGAAAAACTAAAAGAAAAACGATTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAA 

TGAATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTAT 

TTGTCCAAGAGCGCTATAACGCCTTAAAAGCGAACTATCTTCGACAGCTTAATGGTAAAGATTTTTATGCTTTCCCAC 

CAAAGAAGAACAGT AGTAATT T TGAGCAGAT AAAT ATGGTATTGAGAAAT ACAAAGATTGTTGAAGACATGGAAAAAG 

TAAAAGCAACAGAGAGGTTCTTACCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCACCAAAAAA 

CATTTGAT TCACAAACAGATATTTCGTTTGGAAAGGACTTTAT AGAGGTCAGAAT TCCGTGGCAGTTGTT G7UVTT T TT 

CTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGTTAGAAATTGAGAGCATTG 

CTTTAGGATTAGGTGCTAATAGCAAAGAAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGAC 

CCGATACCAAAACCTTTTTAAAAGACTCCTATTATAGTATTAAGAAAG 
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SEQ ID NO, 2810: SAG1552 FROM THE M732 GBS TYPE III STRAIN 

TACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTTAAAGGAGT 
AGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGTTCCA 
TTTAATTTCCAACATGGGGGCAAATACTGTAAGAGTCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCA 
CCACAACAAAGAATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTAT 
AACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAA 
GCAAGTATGGAATACTGATTTTGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGG 
GGATGATTGCAATAGTGGTACTGTCGCTTATACTAATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAA 
AACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGATGAATTGACACATTATGAGACAGCTAA 
ATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAACAGACCCTTTTCATTATCGAAAACCATTTGAGGC 
ACAGGCTCCTAAATACGTACAACTAAATGTAGAAAATATTCAAGCTAATTCAAATGTTAAAGCAGGTATGTTTGCAGC 
ATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATAAAGAGAATATCAGTAAAGAAGATAG 
ACAAAAGATTA/^AGAACTTTCTTTGTCACAGGGATACGTTAAACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGT 
CACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGAAAA 
AGAACAAGGTCAGCGTTTACTAGAAGAT TATGAATCT T TT ATATCATCCGGT AGT TTTGGAGCGACTATCAATGCATG 
GCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGC 

ACAAGTAT ttaatcaaggt tatggttt attaggct t taaaaacgcaaaacatcat tatcaagttg atggtaaaag agg 

CAAAGGAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCT 
CTACCTTGCGATTA7WVCAAAACCTGAAAAACTAAAAGAAAAACGATTATTACCAATAGATATTACACCAAAATCTGG 
TAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAA 
GTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAAAGATTTTTA 
TGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGTATTGAGAAATACAAAGATTGTTGAAGA 
CATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTACCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAG 
GCACCAAAAAACATTTGATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCAGTT 
GTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGTTAGAAAT 
TGAGAGCATTGCTTTAGGAT TAGGTGCTAATAGCAAAGAAAACACACTGATAAAGATGGCAGATTATCGTT TGAAAAA 
TTGGGAGAGACCCG ATACCAAAACCTTT T TAAAAGACTCCTAT TAT AGTAT TAAG 

SEQ ID NO. 2811: SAG1552 FROM THE M781 GBS TYPE III STRAIN 

TTTGATGGTAGTTTGTAT TTACCACAGGGCTTATTAAAAGAAAATACAAGAACTAACTTT GT TGTTAAAGGTGAT ACT 
GTACTTCACAAGCCCACCAATAAACCTTTTGTTGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCAC 
AACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGA 
GTCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCACAACAAAGAATCAAAGAGGCCACTGTATTTG 
TTGCAAGGAATACGTATAGAT TCTTATCGCAATAATGCTT CTATAACAGCT T TT AATGAT AAT TAT AGGGGGT ATTTA 
AAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTTGGTAGCCGTCAT 
TATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCTTATACT 
AATCAT CAAGAGAAAAAAACGCAAT ATAAAGGACGTTATT TTAAAACT TCTGTGG CAGCTAATCCATT TGAGGTCATG 
CTAGCTCAAGTAATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCA7\AC 
TCACCAACAACAGACCCTTTTCATTATCGAAAACCATTTGAGGCACAGGCTCCTAAATACGTACAACTAAATGTAGAA 
AATATTCAAGCTAATTCAAATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAG 
GATTATCTAT TAT TTGATAAAGAGAAT ATCAGTAAAGAAGATAGAGAAAAGATT AAAGAACTTTCT TTGTCAC AGGGA 
TACGTTAAACTGCTAAATGCTTATCACAZUVATCCCTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATT 
GCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGAAAAAGAACAAGGT CAGCGT TTACT AGAAGATT ATGAA 
TCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACA 
TCTT TCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTAT T TAATCAAGGT T ATGGT TT ATT AGGC 
TTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAAAGGAGAGTGGAAACATCCTCTGATGACTAGT 
GCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTA 
AAAGAAAAACGATTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTT 
TCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCC 
TTAAAAGCGAACTATCTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCA71AGAAGAACAGTAGTAATTTT 
GAGCAGATAAATATGGTATTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTA 
CCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCACCAAAAAACATTTGATTCACAAACAGATATT 
TCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAAAATT 
CACGATGATTACTTTAAACATTATGGTGTGAAGGAGTTAGAAATTGAGAGCATTGCTTTAGGATTAGGTGCTAATAGC 
AAAGAAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACCAAAACCTTTTTAAAA 
GACTCCTATTATAGTATTAAGAAAGAATGG 
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Table 2$: Comparative Sequences relating to SAGI552 
(conserved hyp thetical protein) 

SEQ2801 

SEQ2802 

SEQ2803 AAGGGCTTATTAAAAGAAAATACAAGAACT 

SEQ2804 TATTAAAAGAAAATACAAGAACT 

SEQ2805 AAGGGCTTATTAAAAGAAAATACAAGAACT 

SEQ2806 ATTACTTTGATGGTAGTTTGTATTTACCAAAGGGCTTATTAAAAGAAJ\ATACAAGAACT 

SEQ2807 TTTACCACAGGGCTTATTAAAAGAAAATACAAGAACT 

SEQ2808 AAGGGGCTTATTAAAAGAAAATACAAGAACT 

SEQ2809 

SEQ2810 TACAAGAACT 

SEQ2811 TTTGATGGTAGTTTGTATTTACCACAGGGCTTATTAAAAGAAAATACAAGAACT 

SEQ2801 — TTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTT 

SEQ2802 

SEQ2803 ACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTT 

SEQ2 804 ACITTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCTATAAACCTTTTGTTGTT 

SEQ2805 ACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTT 

SEQ2 806 ACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTT 

SEQ2807 ACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTT 

SEQ2808 ACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTT 

SEQ2809 ACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTT 

SEQ2810 ACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTT 

SEQ2811 ACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTT 

SEQ2801 AAGGAGTAGACGTTGAGTCTTCCTTAGCAGGTTATCATCACAACGATTTTCCTATTACT 

SEQ2802 

SEQ2803 AAGGAGTAGAC6TTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACT 

SEQ2804 AAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACT 

SEQ2805 AAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACT 

SEQ2806 AAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACT 

SEQ2807 AAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTT ATCATCACAACGATTTTCCTATTACT 

SEQ2808 AAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACT 

SEQ280 9 AAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACT 

SEQ2810 AAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACT 

SEQ2811 AAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACT 

SEQ2801 AAAAAACGTATCGTGAGTGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGA 

SEQ2802 

SEQ2B03 AAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGA 

SEQ2804 AAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGA 

SEQ2805 AAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGA 

SEQ2806 AAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGA 

SEQ2807 AAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGA 

SEQ2B08 AAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGA 

SEQ2809 AAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGA 

SEQ2810 AAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGA 

SEQ2811 AAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGA 

SEQ2801 TCAAMTACCGATGAATGTTGCATTTTACGATGCTTTATATCACCACAACAAAGCATCA 

SEQ2802 

SEQ2803 TCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCACAACAAAGCATCA 

SEQ2 804 TCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCACAACAAAGCATCA 

SEQ2805 TCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCACAACAAAGCATCA 

SEQ2806 TOUU3GTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCACAACAAAGCATCA 

SEQ2807 TCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCACAACAAAGAATCA 

SEQ2808 TCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCACAACAAAGCATCA 

SEQ2809 TCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCACAACAAAGCATCA 

SEQ2810 TCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCACAACAAAGAATCA 

SEQ2811 TCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCACAACAAAGAATCA 

SEQ2801 AGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCT 

SEQ2802 

SEQ2803 AGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCT 

SEQ2804 AGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCT 

SEQ2805 AGAGGCCACTGTATTTGTTGCAAGGAAT ACGTATAGATTCTTATCGCAATAATGCTTCT 

SEQ2806 AGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCT 

SEQ2807 AGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCT 

SEQ2808 AGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCT 



Table 



Comparative Sequences relating to SAGi552 
(conserved hypothetical protein) 



SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ28U 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 



AGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCT 
AGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCT 
AGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCT 

TAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTG 

TAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTG 
TAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTG 
TAACAGCTTTTAATGATAATTATAGGGGGTATTTA/VAACGAGAAGCAAAAGGCGTTGTG 
TAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTG 
TAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTG 
TAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTG 
TAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTG 
TAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTG 
TAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTG 

ATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTTGGTAGCCGTCATTATCAT 

ATGACTA-GTGCAACAGGAGATGACTTATAT—GCTAGCAGTGATGAAAGC 

ATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTGGGTAGCCGTCATTATCAT 
ATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTGGGTAGCCGTCATTATCAT 
ATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTGGGTAGCCGTCATTATCAT 
ATATTCTCCATGGGCGTAAGCAAGTATGGAATACAGATTTTGGTAGCCGTCATTATCAT 
ATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTTGGTAGCCGTCATTATCAT 
ATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTTGGTAGCAGTCATTATCAT 
ATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTTGGTAGCAGTCATTATCAT 
ATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTTGGTAGCCGTCATTATCAT 
ATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTTGGTAGCCGTCATTATCAT 

TATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGT-AC 
TAT — CTCTA — CCTTGCG-ATTAAAACAAAACCTGAAAAACTAAAAGAAAAACGATTAT 
TATGATCTTAGTCCTTGGGTACTTG<STTATGTCGTAGGGGATGATTGGAATAGTGGT-AC 
TATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGT-AC 
TATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGT--AC 
TATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGT— AC 
TATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGT-AC 
TATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATGGACATAGTGGT— AC 
TATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGT-AC 
TATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGCAATAGTGGT-AC 
TATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGT-AC 

TGTCGCTT-ATACTAATCATCAAGAGAA-AAAAACGCAATATAAAGGAC-GTTATTTTAA 
TACCAATAGATATTA — CACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCAC 
TGTCGCTT-ATACTAATCATCAAGAGAA-AAAAACGCAATATAAAGGAC-GTTATTTTAA 
TGTCGCTT-ATACTAATCATCAAGAGAA-AAAAACGCAATATAAAGGAC-GTTATTTTAA 
TGTCGCTT-ATACTAATCATCAAGAGAA-AAAAACGCAATATAAAGGAC-GTTATTTTAA 
TGTCGCTT-ATACTAATCATCAAGAGAA-AAAAACGCAATATAAAGGAC-GTTATTTTAA 
TGTCGCTT-ATACTAATCATCAAGAGAA-AAAAACGCAATATAAAGGAC-GTTATTTTAA 
TGTCGCTTTATACTAATCATCAAGAGGAGAAAAACGCAATATAAAGGAC-GTTATTTTAA 
TGTCGCTT-ATACTAATCATCAAGAGAA-AAAAACGCAATATAAAGGAC-GTTATTTTAA 
TGTCGCTT-ATACTAATCATCAAGAGAA-AAAAACGCAATAT7VAAGGAC-GTTATTTTAA 
TGTCGCTT-ATACTAATCATCAAGAGAA-AAAAACGCAATATAAAGGAC-GTTATTTTAA 

AACTTCTGCGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTTATGGAT — GAATTG 
ATTTTCTAAATCTAGTGA-CTTTGTATTGTC-TATTGATCCAAATGGCAAGTCTGAATT- 
AACTTCTGTGGCAGCT AATCCATTTGAGGT CATGCT AGCTCAAGT AATGGAT — GAATTG 
AACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGAT — GAATTG 
AACTTCTGTGGCAGCT AATCCATTTGAGGTCATGCTAGCTCAAGTAATGGAT — GAATTG 
AACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGAT — GAATTG 
AACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGAT — GAATTG 
AACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGAT — GAATTG 
AACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGAT — GAATTG 
AACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGAT — GAATTG 
AACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGAT — GAATTG 

ACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTC7U\ACTCACCA 
ATTTGTC- CAAG AGCGCT AT A- ATGCCT — TAAAAGCGAACTATCTTCGACAGCTTAACG 
ACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCA 
ACACATTATGAGACAGCTAAATATGGTTGGC7VACATTTGATTAGTTTTTCAAACTCACCA 
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Table 2 



SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2B11 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SBQ2808 
SBQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
' SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ281X 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 



Comparative Sequences relating to SA' 
(conserved hypothetical protein) 



ACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCA 
ACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCA 
ACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCA 
ACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCA 
ACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCA 
ACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCA 
ACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCA 

CAACAGAC CCTTTTCGTTATCGAAAACCATTTG-AGGCACAGGCTCCTAAATA 

TAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATCAATA 

CAACAGAC CCTTTTCATTATCGAAAACCATTTG-AGGCACAGGCTCCTAAATA 

CAACAGAC CCTTTTCATTATCGAAAACCATTTG-AGGCACAGGCTCCTAAATA 

CAACAGAC CCTTTTCATTATCGAAAACCATTTG-AGGCACAGGCTCCTAAATA 

CAACAGAC CCTTTTCATTATCGAAAACCATTTG-AGGCACAGGCTCCTAAATA 

CAACAGAC CCTTTTCATTATCGAAAACCATTTG-AGGCACAGGCTCCTAAATA 

CAACAGAC CCTTTTCATTATCGAAAACCATTTG-AGGCACAGGCTCCTAAATA 

CAACAGAC CCTTTTCATTATCGAAAACCATTTG-AGGCACAGGCTCCTAAATA 

CAACAGAC CCTTTTCATTATCGAAAACCATTTG-AGGCACAGGCTCCTAAATA 

CAACAGAC CCTTTTCATTATCGAAAACCATTTG-AGGCACAGGCTCCTAAATA 

G-TACAACTAAATGTAGAAAATATTCAAGCTAATTCGAATGTTAAAGCA GGTATTT 

GGTATTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGT 

G-TACAACTAAATGTAGAAAATATTCAAGCTAATTCAAATGTTAAAGCA GGTATGT 

G-TACAACTAAATGTAGAAAATATTCAAGCTAATTCAAATGTTAAAGCA GGTATGT 

G-TACAACTAAATGTAGAAAATATTCAAGCTAATTCAAATGTTAAAGCA GGTATGT 

G-TACAACTAAATGTAGAAAATATTCAAGCTAATTCAAATGTTAAAGCA GGTATGT 

G-*TACAACTA7\ATGTAGAAAATATTCAAGCTAATTCAAATGTTAAAGCA GGTATGT 

G-TACAACTAAATGTAGAAAATATTCAAGCTAATTCGAATGTTAAAGCA GGTATGT 

G-TACAACTAAATGTAGAAAATATTCAAGCTAATTCGAATGTTAAAGCA GGTATGT 

G-TACAACTAAATGTAGAAAATATTCAAGCTAATTCAAATGTTAAAGCA GGTATGT 

G-TACAACTAAATGTAGAAAATATTCAAGCTAATTCAAATGTTAAAGCA GGTATGT 

TTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATA 

TCTTACCAACTCATCCTACTGGTCTT CTCAAAACAGGAACAATTGAT-AGGCACCA 

TTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATA 
TTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATA 
TTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATA 
TTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATA 
TTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATA 
TTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATA 
TTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATA 
TTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATA 
TTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATA 

AAGAGAATATCAGTAAAGAAGATAGACAAAAGATT-AAAGAACTTTCTTTGTCACAGGGA 
AAAAACATTTGATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAAT 
AAGAGAATATCAGTAAAGAAGATAGACAAAAGATT-AAAGAACTTTCTTTGTCACAGGGA 
AAGAGAATATCAGTAAAGAAGATAGACAAAAGATT-AAAGAACTTTCTTTGTCACAGGGA 
AAGAGAATATCAGTAAAGAAGATAGACAAAAGATT-AAAGAACTTTCTTTGTCACAGGGA 
AAGAGAATATCAGTAAAGAAGATAGACAAAAGATT-AAAGAACTTTCTTTGTCACAGGGA 
AAGAGAATATCAGTAAAGAAGATAGACAAAAGATT-AAAGAACTTTCTTTGTCACAGGGA 
AAGAGAATATCAGTAAAGAAGATAGACAAAAGATT-AAAGAACTTTCTTTGTCACAGGGA 
AAGAGAATATCAGTAAAGAAGATAGACAAAAGATT-AAAGAACTTTCTTTGTCACAGGGA 
AAGAGAATATCAGTAT^AGAAGATAGACAAAAGATT-AAAGAACTTTCTTTGTCACAGGGA 
AAGAGAAT AT CAGTAAAGAAGAT AGACAAAAGATT - AAAG AACTTTCTTTGT CACAGGGA 

TACGTTA-AACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTA 

TCCGTGGCAGTTGTTGAATTTTTCTGATCCA TCATCTCAAAAAATTCACGATGATTA 

TACGTTA-AACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTA 
TACGTTA-AACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTA 
TACGTTA-AACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTA 
TACGTTA-AACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTA 
TACGTTA-AACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTA 
TACGTTA-AACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTA 
TACGTTA-AACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTA 
TACGTTA-AACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTA 
TACGTTA-AACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTA 
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„SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 

SEQ2802" 

SEQ2803 

SEQ2804 

SEQ2805 

SEQ2806 

SEQ2807 

SEQ2808 

SEQ2809 

SEQ2810 

SEQ2811 

SEQ2B01 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 



Comparative Sequences relating t 
(conserved hypothetical pr tein) 



SAGB52 



TCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGA 
TTTAAACATTATGGTGTGAAGGAGTTAGAAATTGA— GAGCATTGCTTTAGGATTAGGTG 
TCGACAGCGAGAGGTATTGCCCAAAAAG/^AATTGATAAACGTCCTCTGCCGATTAATGA 
TCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGA 
TCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGA 
TCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGA 
TCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGA 
TCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGA 
TCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGA 
TCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGA 
TCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGA 

AAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTT 

TAATAGCAAAGAAAACACACTGATAAAGATGGCAGAT TATCGTTTGAAAAATT 

AAAGAACAAGGTCAGCGTTTACTAGMGATTATGAATCTTTTATATCATCCGGTAGTTT 
AAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTT 
AAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTT 
AAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTT 
AAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTT 
AAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTT 
AAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTT 
AAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTT 
AAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTT 

GGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCMGGGCGTGGAATACATCCTT 

GGAGAGAC — CCGATAC CAAAACCTTTTTAA AAGACTCCTATTATAGTATT 

GGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTT 
GGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTT 
GGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTT 
GGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTT 
GGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTT 
GGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGTGTGGAATACATCCTT 
GGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGTGTGGAATACATCCTT 
GGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTT 
GGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTT 

GCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTA 

A — AGAAAGAA 

GCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTA 
GCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTA 
GCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTA 
GCCACAAATAAACATAATCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTA 
GCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTA 
GCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTA 
GCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTA 
GCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTA 
GCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTA 

GGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAA 

GGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAA 
GGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAA 
GGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAA 
GGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAA 
GGTTTATTAGGCTTTAAAAACGCA7VAACATCATTATCAAGTTGATGGTAAAAGAGGCAA 
GGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAGGTTGATGGTAAAAGAGGCAA 
GGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAGGTTGATGGTAATkAGAGGCAA 
GGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAA 
GGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAA 



GGAGAGTGGAAACATCCTCTG- 



GGAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAG 
GGAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAG 
GGAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAG 
GGAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAG 
GGAGAQTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAG 
GAAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAG 
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SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
- SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
* SEQ2810 
SEQ28X1 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2B03 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SBQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 



6 0^0*»a^T «0Be602 




Table Zb: Comparative Sequences relating to SAGIS52 
(conserved hypothetical protein) 

GAAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAG 
GGAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAG 
GGAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAG 



GATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACG 
GATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACG 
GATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACG 
GATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACG 
GATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACG 
GATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACG 
GATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACG 
GATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACG 
GATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACG 



TTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGT 
TTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGT 
TTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGT 
TTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGT 
TTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGT 
TTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGT 
TTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGT 
TTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGT 
TTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGT 



ACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATT 
ACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATT 
ACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATT 
ACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATT 
ACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATT 
ACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATT 
ACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATT 
ACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATT 
ACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATT 



TTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAA 
TTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAA 
TTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAA 
TTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAA 
TTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAA 
TTTGTCCAAGAGCGCTATAACGCCTTAAAAGCGAACTATCTTCGACAGCTTAATGGTAA 
TTTGTCCAAGAGCGCTATAACGCCTTAAAAGCGAACTATCTTCGACAGCTTAATGGTAA 
TTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAA 
TTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAA 



GATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGT 
GATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGT 
GATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGT 
GATTTTTATGCTTTCCCACCAAAG/^AGAACAGTAGTAATTTTGAGCAGATAAATATGGT 
GATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGT 
GATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGT 
GATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGT 
GATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGT 
GATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGT 



TTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGC7VACAGAGAGGTTCTT 
TTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTT 
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SEQ2805 
SEQ2806 
SEQ2807 
SEQ2B08 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ280X 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ28XX 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ280X 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ28X0 
SEQ28XX 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
. SEQ2811 

SEQ280X 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 



C mparative Sequences relating to SAtt552 
(conserved hypothetical pr tein) 

TTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAAraGAGAGGTTCTT 
TTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTT 
TTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTT 
TTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTT 
TTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTT 
TTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTT 
TTGAGAAATACAAAGATTGTTG/VAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTT 



CCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCACCAAAAAACATT 
CCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCACCAAAAAACATT 
CCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCACCAAAAAACATT 
CCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCACCAAAAAACATT 
CCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCACCAAAAAACATT 
CCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCACCAAAAAACATT 
CCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCACCAAAAAACATT 
CCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCACCAAAAAACATT 
CCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCACCAAAAAACATT 



GATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCA 
GATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCA 
GATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCA 
GATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCA 
GATTCACAACCAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCA 
GATTCACAT^ACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCA 
GATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCA 
GATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCA 
GATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCA 



TTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTA 
TTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTA 
TTGTTGAATTTTTCTGATCCATCATCTCAAAGAATTCACGATGATTACTTTAAACATTA 
TTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTA 
TTGTTGAATTTTTCTGATCCATCAT CTC AAAAAATTCACGATGAT T ACTTT AAACATTA 
TTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTA 
TTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTA 
TTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTA 
TTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATT ACTTT AAACATTA 



GGTGTGAAGGAGTTAGAAATTGAGAG — CATTGCTTTAGGATTAGGTGCTAATAGCAAA 
GGTGTGAAGGAGTTAGAAATTGAGAG — CATTGCTTTAGGATTAGGTGCTAATAGCAAA 
GGTGTGAAGGAGTTAGAAAATTGAGAGCCATTGCTTTAGGATTAGGTGCTAATAGCAAA 
GGTGTGAAGGAGTTAGAAATTGAGAG — CATTGCTTTAGGATTAGGTGCTAATAGCAAA 
GGTGTGAAGGAGTTAGAAATTGAGAG — CATTGCTTTAGGATTAGGTGCTAATAGCAAA 
GGTGTGAAGGAGTTAGAAATTGAGAG — CATTGCTTTAGGATTAGGTGCTAATAGCAAA 
GGTGTGAAGGAGTTAGAAATTGAGAG — CATTGCTTTAGGATTAGGTGCTAATAGCAAA 
GGTGTGAAGGAGTTAGAAATTGAGAG — CATTGCTTTAGGATTAGGTGCTAATAGCAAA 
GGTGTGAAGGAGTTAGAAATTGAGAG — CATTGCTTTAGGATTAGGTGCTAATAGCAAA 



AAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACC 
AAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACC 
AAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACC 
AAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACC 
AAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACC 
AAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACC 
AAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACC 
AAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACC 
AAAACACACTGATAAAGATGGCAGATTATCGTTTGA7\AAATTGGGAGAGACCCGATACC 
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SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 

SEQ2801 
SEQ2802 
SEQ2803 
SEQ2804 
SEQ2805 
SEQ2806 
SEQ2807 
SEQ2808 
SEQ2809 
SEQ2810 
SEQ2811 



Table 



Comparative Sequences relating to SA< 
(conserved hypothetical protein) 




AAACCTTTTTAAAAGACTCCTATTATGTATTAAGAAAGAA 

AAACCTTTTTAAAAGACTCCTATTATAGTATTAAGAAAGAATGGTCTAAAGAAAGAGAG 
AAACCTTTTTAAAAGA 

AAACCTTTTTAAAAGACTCCTATTATGTATTAAGAAAGA 

AAACCTTTTTAAAAGACT 

AAACCTTTTTAAAAGACTCCTATTATAGT 

AAACCTTTTTAAAAGACTCCTATTATAGTATTAAGAAAG 

AAACCTTTTTAAAAGACTCCTATTATAGTATTAAG 

AAACCTTTTTAAAAGACTCCTATTATAGTATTAAGAAAGAATGG 



GAACATATGGT CCA 



>SEQ ID NO 2850:62_1169NT frame: 1 

FWKGDTVMKPTNKPF\A^GVDVESSLAGYHHNDFPITQKTYREWFHLISNMGANTVRV 
KVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNASITAFNDNYRGYLKREAKGWD 

ilhgrkqvwntdfgsrhyhydlspwvlgywgddwnsgtvaytnhqekktqykgryfkts 
aaanpravmiaqvmdelthyet/ycygwqhlisfsnspttdpfryrkpfeaqapkyvqlnv 
eniqan snvkagi faaykai dfhprykdyllfdkeni skedrqkikels lsqgyvkllna 
yhkipvlvtgygystargiaqkeidkrplpinekeqgqrlledyesfi s sgs fgatinaw 
qdd™arawntsfatnkhsqflwgdaqvfnmygllgfknakhhyqvdgkrgkgewkhpl 
mtsatgddlyassdesylylaiktkpeklkekrllpiditpksgsrkmngskvtfskssd 
fvlsidpngkselfvqerynalkanylrqlngkdfyafppkknssnfeqinmvlrntkiv 

EDMEKVKATEREXPTHPTGLLKTGTIDRHQKTFDSQTDISFGKDFIEVRIPWQLLNFSDP 
SSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLKNWERPDTKTFLKDSY 
YSI.ER 

>SEQ ID NO 2851:62_18RS21 frame: 1 

KGUiKENTRTNFVWGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWFHL 
I SNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQG IRI D S YRNNAS ITAFN DN YRG 
YLKREAKGWDILHGRKQVWNTDLGSRHYHYDLSPWLGYVVGDDWNSGTVAYTNHQEKK 
TQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTTDPFHYRKPFE 
AQAPKYVQLNVENIQANSNVKAGMFAAYKAI DFHPRYKDYLLFDKENI SKEDRQKIKELS 
LSQGYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESFIS 
SGS FGAT IN AWQD DWNARAWNT S FATNKHS QFLWGDAQVFNQG YGLLG FKNAKH HYQVDG 
KRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMN 
GSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNFEQ 
INMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVR 
IPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLKNWER 
PDTKTFLKDSYYVLRK 

>SEQ ID NO 2652 :62__2 603 frame: 3 

LKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHKDFPITQKTYREWFHLISN 
MGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNASITAFNDNYRGYLK 
REAKGWDILHGRKQVWNTDLGSRHYHYDLSPWVLGYWGDDWNSGTVAYTNHQEKKTQY 
KGRYFKTS VAANPFEVMLAQVMDELTHYETAKYGWQHL I S FSNS PTTDP FHYRKP FEAQA 
PKWQLNVENIQANSNVKAGMFAAYKAI DFHPRYKDYLLFDKENI SKEDRQKIKELS LSQ 
GYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESFISSGS 
FGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDGKRG 
KGEWKHPLMTSATGDDLYASSDESYLYIAIKTKPEKLKEKRLLPIDITPKSGSRKMNGSK 
VTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNFEQINM 
VLRNTKI VEDMEKVKATERFLPTH PTGLLKTGTT DRHQKT FDSQT DI S FGKDFIE VRI PW 
QLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLKNWERPDT 
KTFLKDSYYSIKKEWSKERERTYGP 
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6*0*40 




Table ZS: C mparative Sequences relating to SAG1552 
(conserved hypothetical pr tein) 



>SEQ ID NO 2S53:62_A909 frame: 1 

KGLLKENTRTNEVVKGDTVLHKPTNKPFWKGVDVESSIAGYHHNDFPITQKTYREWFHL 
I SNMG ANTVRVKVPMNVAFY DALYHHNKASKRPLYLLQGI RI DSYRNNAS ITAFNDN YRG 
YLKREAKGWDILHGRKQVWNTDLGSRHYHYDLSPWVLGYWGDDWNSGTVAYTNHQEKK 
TQYKGRYFKTS VAANPFE VMLAQVMDELTHYETAKYGWQHLI S FSN S PTT DPFH YRKPFE 
AQAPKYVQLNVENIQAN SNVKAGMFAAYKAI DFHPRYKDYLLFDKEN I SKEDRQKI KELS 
LSQGYVKLLNAYHKI PVLVTGYGYSTARGI AQKE I DKRPLPINEKEQGQRLLEDYE S FI S 
SGSFGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDG 
KRGKGEWKHPLMTSATGDDLYASSDESYLYIAIKTKPEKLKEKRLLPIDITPKSGSRKMN 
GSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNFEQ 
INMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVR 
IPWQLLNFSDPSSQRIHDDYFKHYGVKELEN • EPLL . D . VLIAICKTH . . RWQI IV . KIGR 
DPIPKPF.K 

>SEQ ID NO 2854:62_A909 frame: 1 

KGLLKENTRTNFWKGDTVLHKPTNKPFVVKGVDVESSLAGYHHNDFPITQKTYREWFHL 
I SNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQG IRI DSYRNNAS I TAFN DNYRG 
YLKREAKGWDILHGRKQVWNTDLGSRHYHYDLSPWVLGYVVGDDWNSGTVAYTNHQEKK 
TQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLI SFSNS PTTDPFHYRKP FE 
AQAPKYVQLNVENIQAN SNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKELS 
LSQGYVKLLNAYHKI PVLVTGYGYSTARGIAQKEI DKRPLPINEKEQGQRLLEDYESFI S 
SGS FGATINAWQDDWNARAWNTS FATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDG 
KRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMN 
GSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGBCDFYAFPPKKNSSNFEQ 
INMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVR 
IPWQLLNFSDPSSQRIHDDYFKHYGVKELEN . EPLL . D . VLIAKKTH . . RWQII V . KIGR 
DPIPKPF.K 

>SEQ ID NO 2855:62jCJB110 frame: 1 

YYFDGSLYLPKGLLKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPIT 
QKTYREWFHLISNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNAS 
ITAFNDNYRGYLKREAKGWDILHGRKQVWNTDFGSRHYHYDLSPWVLGYWGDDWNSGT 
VAYTNHQEKKTQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLIS FSNS PTT 
DPFHYRKPFEAQAPKYVQLNVENIQAN SNVKAGMFAAYKAI DFHPRYKDYLLFDKEN I SK 
EDRQKIKELSLSQGYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQR 
LLE DYE S FISSGS FGAT INAWQDDWNARAWNT S FATNKHNQFLWGDAQVFNQGYGLLGFK 
NAKHHYQVDGKRGKGEWKHPI^JTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDI 
TPKSGSRKMNGSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFP 
PKKNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDI 
SFGKDFIEVRI PWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKM 
ADYRLKNWERPDTKTFLKDSYYVLRK 

>SEQ ID NO 2856:62_C0H1 frame: 2 

LPQGLLKENTRTNFVVKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWF 
HLI SNMGANTVRVKVPMNVAFYDALYHHNKE SKRPLYLLQGI R I DSYRNNAS ITAFNDNY 
RG YLKREAKGWDI LHGRKQVWNT DFGSRH YHYDLS PWVLGYWGDDWNSGTVAYTNHQE 
KKTQYKGRYFKT SVAANPFEVMLAQVMDELTHYETAKYGWQHLIS FSNS PTTDP FHYRKP 
FEAQAPKYVQLN VENIQAN SNVKAGMFAAYKAI DFHPRYKDYLLFDKEN I SKEDRQKI KE 
LSLSQGYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESF 
IS SGS FGATINAWQDDWNARAWNTS FATNKHSQFLWGDAQV FNQGYGLLGFKNAKHHYQV 
DGKRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRK 
MNGSKVTFSKSS DFVLS I DPNGKSELFVQERYNALKAN YLRQLNGKDFYAFPPKKNSSNF 
EQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQPDISFGKDFIE 
VRIPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLKNW 
ERPDTKTFLKD 

>SEQ ID NO 2857:62_H36B frame: 2 

RGLLKENTRTNEVVKGDTVLHKPTNKPFVVKGVDVESSLAGYHHNDFPITQKTYREWFHL 
I SNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRI DSYRNNAS ITAFNDNYRG 

YLKREAKGWDILHGRKQVWNTDFGSSHYHYDLSPWVLGYWGDDGHSGTVALY 

» 

>SEQ ID NO 2858:62_JM9130013 frame: 3 

FWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWFHLISNMGANTVRV 
KVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNASITAFNDNYRGYLKREAKGWD 
ILHGRKQVWNTDFGSSHYHYDLSPWVLGYWGDDWNSGTVAYTNHQEKKTQYKGRYFKTS 
VAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTTDPFHYRKPFEAQAPKYVQLNV 
ENIQANSNVKAOIFAAYKAI D FHPRYKDYXLFDKEN I SKEDRQKI KELS LSQGYVKLLNA 
YHKI PVLVTGYGYSTARGIAQKEI DKRPLPINEKEQGQRLLEDYESFISSGSFGATINAW 
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Comparative Sequences relating t SAttSS: 



Table 25: Comparative Sequences relating t SAG1552 
(conserved hypothetical protein) 

QDDWNARVWNTSFATNKHSQFLWGDAQVFNQGYGLLGFECNAKHHYQVDGKRGKEEWKHPL 

MTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMNGSKVTFSKSSD 

FVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNFEQINMVLRNTKIV 

EDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVRIPWQLLNFSDP 

SSQKIHDDYFKHYGVKELEIESIALGLGANSBCENTLIKMADYRLKNWERPDTKTFLKDSY 
YSIKK 

>SEQ ID NO 2859:62_M732 frame: 2 

TRTNFVVKGDTVLHKPTNKPFVVKGVDVESSLAGYHHNDFPITQKTYREWFHLISNMGAN 
TVRVKVPMNVAFYDALYHHNKES KRPLYLLQG IRI DS YRNNAS ITAFN DNYRGYLKREAK 
GWDILHGRKQVWNTDFGSRHYHYDLSPWLGYWGDDCNSGTVAYTNHQEKKTQYKGRY 
FKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLI S FSNS PTTD PFHYRKPFEAQAPKYV 
QLNVEN IQANSNVKAGMFAAYKAI DFHPRYKDYLLFDKENISKEDRQKIKELSLSQGYVK 
LLN AYHKI PVLVTGYG YST ARGI AQKEI DKR PLPINEKEQGQRLLE DYES FI S SGS FG AT 
INAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDGKRGKGEW 
KHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMNGSKVTFS 
KSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPIOCNSSNFEQINMVLRN 
TKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVRIPWQLLN 
FSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLKNWERPDTKTFL 
KDSYYSIK 

>SEQ ID NO 2860:62_M781 frame: 1 

FDGSLYLPQGLLKENTRTWFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQK 
T YREW FHLI SNMGANTVRVKVPMNVAFY DAL YHHNECE SKRPLYLLQGI RI DS YRNNAS IT 
AFNDNYRGYLKRE7VKGVVDILHGRKQVWNTDFGSRHYHYDLSPWVLGYVVGDDWNSGTVA 
YTNHQEKKTQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLI S FSNSPTTDP 
FHYRKPFEAQAPKYVQLNVEN IQANSNVKAGMFAAYKAIDFH PRYKDYLLFDKENI SKED 
RQKIKELSLSQGYVKLLNAYHKI PVLVTGYGYSTARGIAQKEI DKRPLPINEKEQGQRLL 
EDYESFISSGSFGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNA 
KHHYQVDGKRGKGEWKHPLMTSATGDDLYAS S DES YLYLAIKTKPEKLKEKRLLP I DITP 
KSGSRKMNGSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPK 
KNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISF 
GKDFIEVRIPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMAD 
YRLKNWERPDTKTFLKDSYYSIKKEW 

SEQ2850 FWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPIT 

SEQ2851 KGLLKENTRTNFVVKGDTVLHKPTNKPFVVKGVDVESSLAGYHHNDFPIT 

SEQ2B52 LKENT RTN FWKG DT VLHK PTNKPFWKG VD VE S S LAGYHHN D F P IT 

SEQ2853 KGLLKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPIT 

SEQ2854 KGLLKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPIT 

SEQ2855 YFDGSLYLPKGLLBCENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPIT 

SEQ2 856 LPQGLLKENTRTNFVVKGDTVLHKPTNKPFVVKGVDVESSLAGYHHNDFPIT 

SEQ2857 RGLLKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPIT 

SEQ2858 FWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPIT 

SEQ2859 TRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPIT 

SEQ2860 -FDGSLYLPQGLLKENTRTNFVVKGDTVLHKPTNKPEWKGVDVESSLAGYHHNDFPIT 

SEQ2850 QKTYREWFHLISNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNAS 

SEQ2851 QKT YREW FHLI SNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRI DS YRNNAS 

SEQ2852 QKTYREWFHLI SNMGANTVRVKVPMNVAFY DAL YHHNKASKR PLYLLQG I RI DS YRNNAS 

SEQ2853 QKT YREW FHLI SNMGANT VRVKVPMNVAFYDALYHHNKASKRPLYLLQG I RIDS YRNNAS 

SEQ2854 QKTYREWFHLI SNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRI DS YRNNAS 

SEQ2855 QKTYREWFHLISNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNAS 

SEQ2B56 QKTYREWFHLISNMGANTVRVKVPMNVAFYDALYHHNKESKRPLYLLQGIRIDS YRNNAS 

SEQ2857 QKT YREW FHL I SNMG ANT VRVKV PMNV AFY D AL YHHNKASKR PL YLLQG I R I DS YRNNAS 

SEQ2858 QKTYREWFHLI SNMGANT VRVKV FMNVAFYDAL YHHNKASKR PLYLLQG IRI DS YRNNAS 

SEQ2859 QKT YREWFHLI SNMGANT VRVKVPMNVAFYDALYHHNKESKRPL YLLQG IRI DS YRNNAS 

SEQ2860 QKT YREWFHL I SNMG AN TVRVKVPMNVAFY DAL YHHNKESKR PL YLLQG IR IDS YRNNAS 

SEQ2850 ITAFN DNYRGYLKREAKG WDI LHGRKQVWNTDFG SRHYHYDLS PWVIjG YWGDDWNSGT 

SEQ2851 ITAFNDNYRGYLKREAKGWDII*HGRKQVWNTDLGSRHYHYDLSPWVLG YWGDDWNSGT 

SEQ2852 ITAFNDNYRGYLKREAKGWDILHGRKQVWNTDLGSRHYHYDLSPWVLG YWGDDWNSGT 

SEQ2853 ITAFNDNYRGYLKREAKGWDILHGRKQVWNTDLGSRHYHYDLSPWVLG YWGDDWNSGT 

SEQ2854 ITAFNDNYRGYLKREAKGWDILHGRKQVWNTDLGSRHYHYDLSPWVLGYWGDDWNSGT 

SEQ2855 ITAFNDNYRGYLKREAKGWDILHGRKQVWNTDFGSRHYHYDLSPWVLGYWGDDWNSGT 

3EQ2856 ITAFNDNYRGYIjKREAKGWDILHGRKQVWNTDFGSRHYHYDLSPWVLG YWGDDWNSGT 

SEQ2857 ITAFNDNYRGYLKREAKGWDILHGRKQVWNTDFGSSHYHYDLSPWVLGYWGDDGHSGT 

SEQ2858 ITAFNDNYRGYLKREAKGWDILHGRKQVWNTDFGSSHYHYDLSPWVLGYWGDDWNSGT 
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Table 



• 

2ST c 



Comparative Sequences relating to SA< 
(conserved hypothetical pr tein) 




DiiS6G2 



SEQ2859 
SBQ2860 
SEQ2850 
SEQ2851 
SEQ2852 
SEQ2853 
SEQ2854 
SEQ2855 
SEQ2856 
SEQ2857 
SEQ2858 
SEQ2859 
SEQ2860 

SEQ2850 
SEQ2851 
SEQ2852 
SEQ2853 
SEQ2854 
SEQ2855 
SEQ2856 
SEQ2857 
SEQ2858 
SEQ2859 
SEQ2860 

SEQ2850 
SEQ285X 
SEQ2852 
SEQ2853 
SEQ2854 
SEQ2855 
SEQ2856 
SEQ2857 
SEQ2858 
SEQ2859 
SEQ2860 

SEQ2850 
SEQ2851 
SEQ2852 
SEQ2853 
SEQ2854 
SEQ2855 
SEQ2856 
SEQ2857 
SRQ2858 
SEQ2859 
SEQ2860 

SEQ2850 
SEQ2851 
SEQ2852 
SEQ2853 
SEQ2854 
SEQ2855 
SEQ2856 
SBQ2857 
SEQ2858 
SEQ2859 
SEQ2860 

SEQ2850 
SEQ2851 
SEQ2852 
SEQ2853 
SEQ2854 
SEQ2855 



ITAFNDNYRGYI^CREAKGWDILHGRKQVWNTDFGSRHYHYDLSPWVLGYVVGDDCNSGT 
ITAFNDN YRG YLKREAKGWDILHGRKQVWNT DFGSRH YH YDLS PWVLGYWGDDWN SGT 
VAYTNHQEKKTQYKGRYFKTSAAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTT 
VAYTNHQEKKTQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTT 
VAYTNHQEKKTQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTT 
VAYTNHQEKKTQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTT 
VAYTNHQEKKTQYKGRY FKTS VAAN PFEVMLAQVMDELTHYETAKYGWQHLI S FSNS PTT 
VAYTNHQEKKTQYKGRY FKTS VAANPFEVMLAQVMDELTH YET AKYGWQHLIS FSNS PTT 
VAYTNHQEKKTQYKGRYFKTS VAANP FEVMLAQVMDELTHYETAKYGWQHLI S FSN S PTT 

VALY 

VAYTNHQEKKTQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLI SFSNS PTT 
VAYTNHQEKKTQYKGRY FKTS VAANP FEVMLAQVMDELTHYETAKYGWQHLI SFSNS PTT 
VAYTNHQEKKTQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTT 

PFRYRKPFEAQAPKYVQLNVEN IQANSNVKAGI FAAYKAIDFHPRYKDYLLFDKEN I SK 
PFHYRKP FEAQAPKYVQLNVENI QANSNVKAGMFAAYKAI*DFHPRYKDYLLFDKENI SK 
PFHYRKPFEAQAPKYVQLN VENIQAN SNVKAGMFAAYKAI DFHPRYKDYLLFDKENI SK 
PFHYRKPFEAQAPKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISK 
PFHYRKPFEAQAPKYVQLNVEN IQANSN VKAGMFAAYKAI DFHPRYKDYLLFDKENI SK 
PFHYRKPFEAQAPKWQLNVEN IQANSNVKAGM FAAYKAI DFHPRYKDYLLFDKEN I S K 
PFHYRKPFEAQAPKYVQLN VEIN I QAN SNVKAGMFAAYKAI DFHPRYKDYLLFDKEN I SK 

PFHYRKPFEAQAPKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISK 
PFHYRKPFEAQAPKWQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISK 
PFHYRKPFEAQAPKYVQLNVEN I QAN SNVKAGMFAAYKAI DFHPRYKDYLLFDKEN I SK 

DRQKIKELSLSQGYVKLLNAYHKIPVLVTGYGYSTARGI AQKE I DKRPLPI NEKEQGQR 
DRQKIKELSLSQGYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQR 
DRQKIKELSLSQGYVKLLNAYHKI PVLVTGYGYSTARGIAQKEI DKRPLPINEKEQGQR 
DRQKI KELSLSQGYVKLLNAYHKI PVLVTGYGYSTARGIAQKE I DKRPLPINEKEQGQR 
DRQKIKELSLSQGYVKLLNAYHKI PVLVTGYGYSTARGIAQKEI DKRPLPINEKEQGQR 
DRQKIKELSLSQGYVKLLNAYHKI PVLVTG YGYSTARG IAQKE I DKRPLPINEKEQGQR 
DRQKIKELSLSQGYVKLLNAYHKIPVLVTG YGYSTARGI AQKE I DKRPLPINEKEQGQR 

DRQKI KELSLS QG YVKLLNAYHKI PVLVTGYGYSTARGIAQKE I DKRPLPINEKEQGQR 
DRQKIKELSLS QGYVKLLNAYHKI PVLVTGYGYSTARGIAQKE I DKRPLPINEKEQGQR 
DRQKIKELSLSQGYVKLLNAYHKI PVLVTGYGYSTARGIAQKE I DKRPLPINEKEQGQR 

LEDYE S FI SSGS FGATINAWQDDWNARAWNT SFATNKHSQFLWGDAQVFNQGYGLLGFK 
LEDYES FI SSGSFGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFK 
LE DYE S FI SSGS FGATIN AWQDDWNARAWNTS FATNKHSQFLWGDAQVFNQGYGLLG FK 
LEDYE S FIS SG S FGATINAWQDDWNARAWNTS FATNKHSQFLWGDAQVFNQG YGLLGFK 
LEDYES FIS SGS FGAT INAWQDDWNARAWNTS FATNKHSQFLWGDAQVFNQG YGLLGFK 
LE DYES FI S SGS FGATINAWQDDWNARAWNT SFATNKHNQFLWGDAQVFNQGYGLLGFK 
LEDYES FI S SGS FGAT INAWQDDWNARAWNTS FATNKHSQFLWGDAQVFNQG YGLLGFK 

LEDYESFI SSGS FGATINAWQDDWNARVWNTSFATNKHSQFLWGDAQVFNQGYGLLGFK 
LEDYES FIS SGS FGAT IN AWQDDWNARAWNTS FATNKHSQFLWGDAQVFNQGYGLLGFK 
LEDYE S FI S SGSFGATINAWQDDWNARAWNT SFATNKHSQFLWGDAQVFNQGYGLLGFK 

AKHHYQVDGKRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDI 
AKHHYQVDGKRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDI 
AKHHYQVDGKRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDI 
AKHHYQVDGKRGKGEWKHPU>!TSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDI 
AKHH YQVDGKRGKGEWKHPLMT S ATG DDLY AS S DE S Y L Y LA I KTKPEKLKEKRLLP I DI 
AKHHYQVDGKRGKGEWKH PLMTS ATGD DLYAS S DE S YLYLAIKTKPEKLKEKRLLP I D I 
AKHHYQVDGKRGKGEWKHPLMTSATGDDLYASSDESYLYIAIKTKPEKLKEKRLLPIDI 

AKHHYQVDGKRGKEEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDI 
AKHHYQVDGKRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDI 
AKHHYQVDGKRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDI 

PKSGSRKMNGSKVT FSKSSDFVLS I DPNGKSELFVQERYNALKANYLRQLNGKDFYAFP 
PKSGSRKMNGSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFP 
PKSGSRKMNGSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFP 
PKSGSRKMNGSKVT FSKSSDFVLS I DPNGKSELFVQERYNALKANYLRQLNGKDFYAFP 
PKSGSRKMNGSKVT FSKSSDFVLS I DPNGKSELFVQERYNALKANYLRQLNGKDFYAFP 
PKSGSRKMNGSKVT FSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFP 
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Table 



SEQ2856 
SBQ2857 
SEQ2858 
SBQ2859 
SEQ2860 

SBQ2850 
SEQ2851 
SEQ2852 
SEQ2853 
SEQ2854 
SEQ2855 
SEQ2856 
SEQ2857 
SEQ2858 
SEQ2859 
SEQ2860 

SEQ2850 
SEQ2851 
SEQ2852 
SEQ2853 
SEQ2854 
SEQ2855 
SEQ2856 
SEQ2857 
SEQ2858 
SEQ2859 
SEQ2860 

SEQ2850 
SEQ2851 
SEQ2852 
SEQ2853 
SEQ2854 
SEQ2855 
SEQ2856 
SEQ2857 
SEQ2858 
SEQ2859 
SEQ2860 



C mparative Sequences relating to SA< 
(conserved hypothetical protein) 




PKSGSRKMNGSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFP 

PKSG SRKMNGSKVT FSKS S DFVLS I DPNGKSELFVQERYNALKAN YLRQLNGKDFY AFP 
PKSGSRKMNGSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFP 
PKSGSRKMNGSKVTFSKSSDFVLS I DPNGKSELFVQERYNALKAN YLRQLNGKDFYAFP 

KKNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTIDRHQKTFDSQTDI 
KKNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDI 
KKNSSNFEQINMVLRNTKIVEDMEKVKATfiRFLPTHPTGLLKTGTTDRHQKTFDSQTDI 
KKNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDI 
KKNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDI 
KKNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDI 
KKNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQPDI 

KKNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDI 
KKNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDI 
KKNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDI 

FGKDFI EVRI PWQLLNFS D PS SQKIH DDYFKHYGVKELE XES I ALGLGANSKENTLI KM 
FGKDFIEVRIPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKM 
FGKDFI EVRI PWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKM 
FGKDFI EVRI PWQLLN FS DPS SQRI HDDYFKH YGVKELENEPLLDVLI AKKTHRWQI IV 
FGKDFI EVRIPWQLLNFSDPSSQRIHDDYFKHYGVKELENEPLLDVLIAKKTHRWQIIV 
FGKDFIEVRIPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKM 
FGKDFIEVRIPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKM 

FGKDFIEVRIPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKM 
FGKDFIEVRI PWQLLNFSDPSSQKIHDDYFKHYGVKELE I E S IALGLGANSKENTLIKM 
FGKDFIEVRIPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKM 

DYRLKNWERPDTKTFLKDSYYS IER 

DYRLKNWERPDTKT FLKDS YYVLRK 

DYRLKNWERPDTKTFLKDSYYSIKKEWSKERERTYGP 

IGRDPIPKPFK 

IGRDPI PKPFK 

DYRLKNWERPDTKTFLKDS YYVLRK 

DYRLKNWERPDTKT FLKD 

DYRLKNWERPDTKTFLKDSYYS IKK 

DYRLKNWERPDTKTFLKDSYYS IK 

DYRLKNWERPDTKT FLKDS YYSIKKEW 
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Table 29: C mparStive Sequences relating to SAG1641 (YaeCTkmily pr tein) 





SE< 



NO. 2901: SAG1641 FROM THE 090 GBS TYPE Xa STRAIN 



AATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAA 
GCACGTTGGGATAAAAT TGAAAAGCT AGT AGGCGATAAAGCTAAAATCAAATTCACAGAAT TTACAGATTATACACAA 
CCAAATCAAGCGACAGCCAATAAGGATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAG 
GAAAATAAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCCCCAATTCGTATCTATTCTGAGAAGGTAAAATCT 
CTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTT 
CAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAA 
GATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATTAATAAT 
ACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCAGATAAAAATTCAAAACAATGG 
ATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTAT 
CACACAGATGAAGTGAAAAAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGGAACCCAGCTTTCTTGTACAA 

SEQ ID NO. 2902: SAG1641 FROM THE 1169NT1 GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

ATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAG 
CACGTTGGGATAAAATTGAAAAGCTAG TAGGTG ATAAAGCTAAAATCAAATTTACAGAAT TTACAGATTATACACAAC 
CAAATCAAGCGACAGCC7^ATAAGGATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGG 
AAAATAAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTC 
TTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTC 
AGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGG 
ATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATTAATAATA 
CATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCAGATAAAAATTCAAAACAATGGA 
TTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATC 
ACACAGATGAAGTGAAAAAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGG 

SEQ ID NO. 2903: SAG1641 FROM THE 18RS21 GBS TYPE II STRAIN 

AATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAA 
GCACGTTGGGATAAAATTGAAAAGCTAGTAGGTGATAAAGCTAAAATCAAATTTACAGAATTTACAGATTATACACAA 
CCAAATC/yVGCGACAGCCAATAAGGATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAG 
GA7^AATAAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCT 
CTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTT 
CAGTCAGCAGGT TTAATCAAATTGAATGTT TCTGGTAAGAAGGT TGCAACAGTTGCTAAT ATCACATCTAATAAAAAG 
GATATTAATAT T CAGGAGTT AGATGCGAGTCAAACACCACGTGCACTCAAAGATGT AGAT GCAGCT ATT ATTAATAAT 
ACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCAGATAAAAATTCAAAACAATGG 
ATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTAT 
CACACAGATGAAGTGAAAAAAGTTATCAAAGAT ACT TCAGCTGATAT TCCAC 

SEQ ID NO. 2904: SAG1641 FROM THE 2603 V/R GBS TYPE V STRAIN 

AATCAAGAAGTTTC71GCAAGCTCAACTTCAAGTAAAGTTGTTA7\AGTTGGTGTTATGACCTTTTCTGACACTGAAAAA 

GCACGTTGGGATAAAATTGAAAAGCTAGTAGGTGATAAAGCTAAAATCAAATTTACAGAATTTACAGATTATACACAA 

CCAAATCAAGCGACAGCCAATAAGGATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAAT 

GAAAATAAGAAAAACTTAATTCCACTTGA7VAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCT 

CTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTT 

CAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAG 

GATATTAATAT TCAGGAGTTAGAT GCCAGTCAAACACC ACGTGCACTCAAAGAT GTAGATGCAGCT AT T ATTAATAAT 

ACATACATTGAGCAAGCTAATT TAAAACCTTCAGATGCTATCTT TGTTGAGAAATCAGATAAAAATTCAAAACAATGG 

ATTAATATCATTGCGGGACGTAAAAAT TGGAAAAAGCAAAAGAACGCTAAAGCT ATCCAAGCTATCT TGGAT GCTTAT 

CACACAGATGAAGTGAAAAAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGG 

SEQ ID NO. 2905: SAG1641 FROM THE A909 GBS TYPE la STRAIN 

AATCAAGAAGTTTCAGCZU^GCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAA 
GCACGTTGGGATAAAATTGAAAAGCTAGTAGGTGATAAAGCTAAAATCAAATTTACAGAATTTACAGATTATACACAA 
CCAAATCAAGCGACAGCCAATAAGGATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAG 
GA7VAATAAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCT 
CTTAT^AAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTT 
CAGTCAGC^GGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAG 
GATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATTAATAAT 
ACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCAGATAAAAATTCAAAACAATGG 
ATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTAT 
CACACAGATGAAGTGAAAAAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGG 
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Table 29: Comparative Sequences relating t SAG1641 (YaeCTkmily protein) 




ID NO. 2906: SAG1641 FROM THE CJB110 GBS NONTYPEABLE STRAIN 
AAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTGAA^GCTAGT 
AGGCGATAAAGCTAAAATCAAATTCACAGAATTTACAGATTATACACAACCAAATCAAGCGACAGCCAATAAGGATGT 
GGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAATAAGAAAAACTTAATTCCACTTGA 
AAAGACTTACTTAGCCCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTAT 
TGCAATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGT 
T TCTGGTAAGAAGGT TGCAACAGT TGCTAATATCACATCTAATAAAAAAGATATTAATAT TCAGG AGTTAGATGCGAG 
TCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACC 
TTCAGATGCTATCTTTGTTGAGAAATCAGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTG 
GAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATCACACAGATGAAGTGAAAAAAGTTATCAA 
AGATACTTCAGCTGATATTCCACAATGGAA 

SEQ ID NO. 2907: SAG1641 FROM THE C0H1 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

AGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTG 
GGATAAAAT TGAAAAGC TAGTAGGTGATAAAGCTAAAATCAAATT TACAGAAT TTACAGATTATACACAACCAAATCA 
AGCGACAGCCAATAAGGATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAATAA 
GAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAA 
ATTGAAAAAAGGAGCCACTAT TGCAAT T CCAAATGATGCAACAAATGGTAGCCGTGCAT T GTATGT ACTTCAGTCAGC 
AGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATT^AAAAGGATATTAA 
TATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGC TAT TAT TAAT AATACATACAT 
TGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCAGATAAAAATTCAAAACAATGGATT7VATAT 
CATTGCGGG ACGTAAAAAT TGGAAAAAGCAAAAGAACGCTAAAGCT ATCCAAGCTATCT TGGATGCT TATC ACACAGA 
TGAAGTGAAAAAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGG 

SEQ ID NO. 2908: SAG1641 FROM THE H36b GBS TYPE lb STRAIN 

AAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCAC 
GTTGGGATAAAATTGAAAAGCTAGTAGGTGATAAAGCTAAAATCAAATTTACAGAATTTACAGATTATACACAACCAA 
ATCAAGCGACAGCCAATAAGGATGTGGATATTAATGCCTTTCAAC^TTACAATTTCTTAGAAAACTGG/VATAAGGAAA 
ATAAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTA 
AAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGT 
CAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATA 
TTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATTAATAATACAT 
ACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCAGATAAAAATTCAAAACAATGGATTA 
ATATCAT TGCGGGACGT AAAAAT TGGAAAAAGCAAAAGAACG CTAAAGCTATCCAAGCTATCTTGGATGCT TATCAC A 
CAGATGAAGTGAAAAAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGG 

SEQ ID NO. 2909: SAG1641 FROM THE JM3190013 GBS TYPE VIII STRAIN 

TTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTGGGA 
TAAAATTGAAAAGCTAGTAGGTGATAAAGCTAAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGC 
GACAGCCAATAAGG ATGTGGATATTAATGCCT TTCAACATTACAATT TCTT AGAAAACTGGAATAAGGAAAATAAGAA 
AAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATT 
GAAAAAAGGAGCCACTAT TGCAAT TCCAAATGATGCAACAAAT GGT AGCCGTGCATTGTAT GTCCTTCAGT CAGCAGG 
TTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATAT 
TCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATTAATAATACATACATTGA 
G.CAAGCTAAT TTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCAGATAAAAAT TCAAAACAATGGATTAAT ATCAT 
TGCGGGACGTAAAAATT GGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTT GG AT GCTTATCACACAGATGA 
AGTGAAAAAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGG 

SEQ ID NO. 2910: SAG1641 FROM THE M732 GBS TYPE III STRAIN 

AATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAA 
GCACGTTGGGATAAAATTGAAAAGCTAGTAGGTGATAAAGCTAAAATCAAATTTACAGAATTTACAGATTATACACAA 
CCAAATCAAGCGACAGCCAATAAGGATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAG 
GAAAATAAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCT 
CTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTT 
CAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAG 
GATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATTAATAAT 
ACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAT^ATCAGATAAAAATTCAAAACAATGG 
ATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTA7\AGCTATCCAAGCTATCTTGGATGCTTAT 
CACACAGATGAAGTGAAAAAAGTTATCAAAGATAC 
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Table 29: Comp; 



1 



Sequences relating to SAG1641 (YaeCTamily pr tein) 



SEiTlA> NO. 2911: SAG1641 FROM THE M781 GBS TYPE III STRAIN 
AGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTG 
GGATAAAATTGAAAAGCTAGTAGGTGATAAAGCTAAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCA 
AGCGACAGCCAATAAGGATGTGGATATTAATGCCTTTCAACATTAC71ATTTCTTAGAAAACTGGAATAAGGAAAATAA 
GAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAA 
ATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGC 
AGGT TTAATCAAAT T GAATGT TTCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGAT ATTAA 
TATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATTAATAATACATACAT 
TGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCAGATAAAAATTCAAAACAATGGATTAATAT 
CATTGCGGGACGTAAAAATTGGAAT^AAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTGGGATGCTTATCACACAGA 
TGAAGTGAAAAAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGG 



SEQ2901 
SEQ2902 
SEQ2903 
SEQ2904 
SEQ2905 
SEQ2906 
SEQ2907 
SEQ2908 
SEQ2909 
SEQ2910 
SEQ2911 

SEQ2901 
SEQ2902 
SEQ2903 
SEQ2904 
SEQ2905 
SEQ2906 
SKQ2907 
SEQ2908 
SEQ2909 
SEQ2910 
SEQ2911 

SEQ2901 
SEQ2902 
SEQ2903 
SEQ2904 
SEQ2905 
SEQ2906 
SEQ2907 
SEQ2908 
SEQ2909 
SEQ2910 
SEQ2911 

SEQ2901 
SEQ2902 
SEQ2903 
SEQ2904 
SEQ2905 
SEQ2906 
SEQ2907 
SEQ2908 
SEQ2909 
SEQ2910 
SEQ2911 



ATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACC 
ATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACC 
ATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACC 
ATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACC 
ATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACC 

AAGTAAAGTTGTTAAAGTTGGTGTTATGACC 

AGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACC 

AAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACC 

TTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACC 

ATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACC 
AGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACC 

TTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTGAAAAGCTAGTAGGCGATAAAGCT 
TTTTCTGACACTGAAAAAGCACGTTGGGATAAJ\ATTGAAAAGCTAGTAGGTGATAAAGCT 
TTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTGAAAAGCTAGTAGGTGATAAAGCT 
TTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTGAAAAGCTAGTAGGTGATAAAGCT 
TTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTGAAAAGCTAGTAGGTGATAAAGCT 
TTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTGAAAAGCTAGTAGGCGATAAAGCT 
TTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTGAAAAGCTAGTAGGTGATAAAGCT 
TTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTGAAAAGCTAGTAGGTGATAAAGCT 
TTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTGAAAAGCTAGTAGGTGATAAAGCT 
TTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTGAAAAGCTAGTAGGTGATAAAGCT 
TTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTGAAAAGCTAGTAGGTGATAAAGCT 

AAAATCAAATTCACAGAATTTACAGATTATACACAACCAAATCAAGCGACAGCCAATAAG 
AAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGCGACAGCCAATAAG 
AAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGCGACAGCCAATAAG 
AAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGCGACAGCCAATAAG 
AAAATCAAATTTACAGAATTTACAG ATT ATACAC AACCAAATCAAG CGACAGC CAATAAG 
AAAATCAAATTCACAGAATTTACAGATTATACACAACCAAATCAAGCGACAGCCAATAAG 
AAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGCGACAGCCAATAAG 
AAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGCGACAGCCAATAAG 
AAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGCGACAGCCAATAAG 
AAAATCAAATTT ACAGAATT TACAGATTATAC ACAACCAAATC AAGCGACAG CCAAT AAG 
AAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGCGACAGCCAATAAG 

GATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAAT 
GATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAAT 
GATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAAT 
GATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAAT 
GATGTGGATATTAATGCCTTTGAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAAT 
GATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAAT 
GATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAAT 
GATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAAT 
GATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAAT 
GATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAAT 
GATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAAT 



SEQ2901 AAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCCCCAATTCGTATCTATTCTGAG 

SEQ2 902 AAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAG 

SEQ2903 AAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAG 

SEQ2904 AAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAG 

SEQ2905 AAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAG 

SEQ2 906 AAGA7UVAACTTAATTCCACTTGAAAAGACTTACTTAGCCCCAATTCGTATCTATTCTGAG 

SEQ2907 AAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAG 

SEQ2908 AAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAG 
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lily pr tein) 



. SEQ2910 
SEQ2911 

SEQ2901 
SEQ2902 
SEQ2903 
SEQ2904 
SEQ2905 
SEQ2906 
SEQ2907 
SEQ2908 
SEQ2909 
SEQ2910 
SEQ2911 

SEQ2901 
SBQ2902 
SEQ2903 
SEQ2904 
SEQ2905 
SEQ2906 
SEQ2907 
SEQ2908 
SEQ2909 
SEQ2910 
SEQ2911 

SEQ2901 
SEQ2902 
SEQ2903 
SEQ2904 
SEQ2905 
SEQ2906 
SEQ2907 
SEQ2908 
SEQ2909 
SEQ2910 
SEQ2911 

SEQ2901 
SEQ2902 
SEQ2903 
SEQ2904 
SEQ2905 
SEQ2906 
SEQ2907 
SEQ2908 
SEQ2909 
SEQ2910 
SEQ2911 

SEQ2901 
SEQ2902 
SEQ2903 
SEQ2904 
SEQ2905 
SEQ2906 
SEQ2907 
SEQ2908 
SEQ2909 
SEQ2910 
SEQ2911 



AAGAAAAACTTAATTCCACTTGA2\AAGACTTACTTAGCTCCAATTCGTATCTATTCTGAG 
AAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAG 
AAGAAAAACTTAATTCCACTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAG 

AAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGC7\ATTCCAAATGATGCA 
AAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCA 
AAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCA 
AAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCA 
AAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCA 
AAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCA 
AAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCA 
AAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCA 
AAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCA 
AAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCA 
AAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCA 

ACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTT 
ACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTT 
ACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTT 
ACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTT 
ACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTT 
ACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTT 
ACAAATGGTAGCCGTGCATTGTATGTACTTCAGTCAGCAGGTTTAATCAAATTGAATGTT 
ACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTT 
ACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTT 
ACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTT 
ACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTT 

TCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAAGATATTAATATT 
TCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATT 
TCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATT 
TCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATT 
TCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATT 
TCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAAGATATTAATATT 
TCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATT 
TCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATT 
TCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATT 
TCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATT 
TCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATT 

CAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATT 
CAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATT 
GAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATT 
CAGGAGTTAGATGCCAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATT 
CAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATT 
CAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATT 
CAGGAGTTAGATGCGAGTCAAACACCACGTGCACTC7\AAGATGTAGATGCAGCTATTATT 
CAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATT 
CAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATT 
CAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATT 
CAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGATGCAGCTATTATT 

AATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAA 
AATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAA 
AATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAA 
AATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGT^AA 
AATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAA 
AATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAA 
AATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAA 
AATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAA 
AATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAA 
AATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAA 
AATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAA 
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paTStr 



Table 29: Comparative Sequences relating to SAG1641 (YaeWfomily pr tein) 



TCAGATAAAAATTCAAAACAATGGATTAATATC^TTGCGGGACGTAAAAATTGGAAAAAG 
TCAGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAG 
TCAGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAG 
TCAGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAG 
TCAGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAG 
TCAGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAG 
TCAGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAG 
TCAGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAG 
TCAGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAG 
TCAGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAG 
TCAGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAG 

CAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATCACACAGATGAAGTGAAA 
CAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATCACACAGATGAAGTGAAA 
CAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATCACACAGATGAAGTGAAA 
CAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATCACACAGATGAAGTGAAA 
CAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATCACACAGATGAAGTGAAA 
CAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATCACACAGATGAAGTGAAA 
CAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATCACACAGATGAAGTGAAA 
CAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATCACACAGATGAAGTGAAA 
CAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATCACACAGATGAAGTGAAA 
CAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATCACACAGATGAAGTGAAA 
CAAAAGAACGCTAAAGCTATCCAAGCTATCTGGGATGCTTATCACACAGATGAAGTGAAA 

AAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGGAACCCAGCTTTCTTGTACAA 

AAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGG 

AAAGTTATCAAAGATACTTCAGCTGATATTCCAC 

AAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGG 

AAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGG 

AAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGGAA 

AAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGG 

AAAGTTAT CAAAGATACTT C AGCTGAT ATTCCAC AATGG 

AAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGG 

AAAGTTATCAAAGATAC 

AAAGTTAT CAAAGATACTT C AGCTGAT ATTCCAC AATGG 



>SEQ ID NO 2950: 35_090 frame: 1 

NQEVSAS STS SKWKVGVMT FS DTEKARWDKIEKLVGDKAKIKFTEFT DYTQPNQATANK 
PVDINAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDA 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
NNT YIEQANLKPS DAI FVEKSDKN SKQWIN I IAGRKNWKKQKNAKAIQAILDAYHT DEVK 
KVIKDTSADI PQWNPAFLY 

>SEQ ID NO 2951: 35_1169NT frame: 3 

QEVSASSTSSKVVKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKD 
VDINAFQHYNEXENWNKENKBCNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDAT 
NGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIIN 
NTYIEQANLKPSDAI FVEKSDKNSKQWINI IAGRKNWKKQKNAKAIQAILDAYHTDEVKK 
VIKDTSADIPQW 

>SEQ ID NO 2952: 35_18RS21 frame: 1 

NQEVSAS STS SKWKVGVMT FS DTEKARWDKIEKLVGDKAKIKFTEFT DYTQPNQATANK 
DVDINAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDA 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
NNTYIEQANLKPSDAI FVEKS DKNSKQWINI IAGRKNWKKQKNAKAIQAILDAYHTDEVK 
KVIKDTSADIP 



>SEQ ID NO 2953:35_2603 frame: 1 

♦ NQEVSAS STS SKWKVGVWTFS DTEKARWDKIEKLVGDKAKIKFTEFT DYTQPNQATANK 
DVDINAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDA 
TNGSRALYVLQSAGLIKIiN VSGKKVATVANITSNKKDINIQELDASQTPRALK DVDAAI I 
NNTYIEQANLKPSDAI FVEKSDKN SKQWIN I IAGRKNWKKQKNAKAIQAILDAYHTDEVK 
KVIKDTSADI PQW 
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A ftO^O&jttp » 082602 

Table 29: Compffltive Sequences relating to SAG1641 (Ya^^mily pr tein) 

>S^2 ID KO 2954:35_A909 frame: 1 

NQEVSASSTSSKVVKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTE FTDYTQPNQATANK 
DVDINAFQHYN FLENWNKENKKNLI PLEKTYIAPIRI YSEKVKSLKKLKKGATIAI PNDA 
TNGSEUVLYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
NNTYIEQANLKP S DAI FVEKS DKN SKQWI NI IAGRKNWKKQKNAKAIQAI LDAYHTDEVK 
KVIKDTSADIPQW 

>SEQ ID NO 2955:35jCJBU0 frame: 2 

SKVVKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKDVDINAFQHY 
NFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDATNGSRALYVL 
QSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINNTYIEQANL 
KPSDAIFVEKSDKNSKQWINIIAGRKNWKKQKNAKAIQAILDAYHTDEVKKVIKDTSADI 
PQW 

>SEQ ID NO 2956:35jCOHl frame: 2 

VSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKDVD 
INAFQHYN FLENWNKENKKNL I PLEKTYIAPIRI YSEKVKSLKKLKKGATIAI PNDATNG 
SRALWLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINNT 
YIEQANLKPS DAI EVEKS DKN SKQW IN I IAGRKNWKKQKNAKAIQAI LDAYHT DEVKKVI 
KDTSADIPQW 

>SEQ ID NO 2957:35_H36B frame: 3 

EVSASSTS SKWKVGVMTFS DTEKARWDKIEKLVGDKAKIKFTE FT DYTQPNQAT AN KDV 
DINAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEECVKSIJCKIiKKGATIAIPNDATN 
GSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINN 
T YI EQANLKPS DAI FVEKS DKNSKQWIN I IAGRKNWKKQKNAKAIQAI LDAYHTDEVKKV 
IKDTSADIPQW 

>SEQ ID NO 2958:35_JM9130013 frame: 2 

SASSTSSKVVKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKDVDI 
NAFQHYNFLENWNKENKKNLI PLEKTYIAPIRIYSEKVKSLKKLKKGATIAI PNDATNGS 
RALYVLQSAGLIKLNVSGKKVATVAN IT SNKKDINI QELDASQT PRALKDVDAAI INNTY 
IEQANLKPSDAI FVEKS DKN SKQWINI I AGRKNWKKQKNAKAIQAILDAYHTDEVKKVIK 
DTSADIPQW 

>SEQ ID NO 2959:35_M732 frame: 1 

NQEVSASSTSSKVVKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANK 

DVDINAFQHYNFLENWNKENKKNLIPLEKTYIAPIRIYSEKVKSLKKLKKGATIAIPNDA 

TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQT PRALKDVDAAI I 

NNTYIEQANLKPSDAJC FVEKS DBCN SKQWINI IAGRKNWKKQKNAKAIQAI LDAYHTDEVK 
KVIKD 

>SEQ ID NO 2960:35_M7B1 frame: 2 

VSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKDVD 
INAFQHYNFLENWNKENKKNL I PLEKTYIAPIRI YSEKVKSLKKLKKGATIAI PNDATNG 
SRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINNT 
YIEQANLKPS DAI FVEKS DKNSKQWIN 1 1 AGRKNWKKQKNAKAIQAIWDAYHTDEVKKVI 
KDTSADIPQW 

SEQ2950 QEVS AS STS SKWKVGVMTFS DTEKARWDKIEKLVGDKAKIKFTE FTDYTQPNQATANK 

SEQ2951 QEVS AS STS SKWKVGVMTFS DTEBCARW DKIEKLVGDKAK I KFTE FTDYTQPNQATANK 

SEQ2952 QEVSASSTSSBCWKVGVMTFS DTEKARWDKIEKLVGDKAKIKFTE FTDYTQPNQATANK 

SEQ2953 QEVSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANK 
SEQ2954 QEVS AS STS SKWKVGVMT FS DTEKARWDKIEKLVG DKAKIKFTE FTDYTQPNQATANK 

SEQ2955 SKWKVGVMTFS DTEKARWDKIEKLVGDKAKIKE*TE FTDYTQPNQATANK 

SEQ2956 — VSAS STS SKWKVGVMTFS DTEKARWDKIEKLVG DKAKIKFTE FTDYTQPNQATANK 

SEQ2957 -EVSASSTS SKWKVGVMTFS DTEKARWDKIEKLVG DKAKIKFTE FTDYTQPNQATANK 

SEQ2958 SAS STS SKWKVGVMTFS DTEKARWDKIEKLVG DKAKIKFTEFTDYTQPNQATANK 

SEQ2959 QEVSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANK 
SEQ2960 — VSASSTS SKWKVGVMTFS DTEKARWDKIEKLVGDKAKIKFTE FTDYTQPNQATANK 

SEQ2950 DVDINAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDA 

SEQ2951 DVD INAFQHYN FLENWNKENKKNLI PLEKTYIAPIRI YSEKVKSLKKLKKGATIAI PNDA 

SEQ2952 DVDINAFQHYN FLENWNKENKKNLI PLEKTYIAPIRI YSEKVKSLKKLKKGATIAI PNDA 

SEQ2953 DVDINAFQHYNEXENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDA 

SEQ2 954 DVDINAFQHYN FLENWNKENKKNLI PLEKTYIAPIRI YSEKVKSLKKLKKGATIAI PNDA 

SEQ2955 DVDINAF^HYNFLENWNKENKKNLIPLEKTYIAPIRIYSEKVKSLKKLKKGATIAIPNDA 

SEQ2956 DVDINAFQHYNEXENWNKENKKNLI PLEKTYIAPIRI YSEKVKSLKKLKKGATIAI PNDA 

SEQ2957 DVDINAFQHYNEXENWNKENKKNLI PLEKTYIAPIRI YSEKVKSLKKLKKGATIAI PNDA ' 

SEQ2958 DVDINAFQHYN FLENWNKENKKNLI PLEKTYIAPIRI YSEKVKSLKKLKKGATIAI PNDA 
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SEQ2960 



SEQ2950 
SEQ2951 
SEQ2952 
SEQ2953 
SEQ2954 
SEQ2955 
SEQ2956 
SEQ2957 
SEQ2958 
SEQ2959 
SEQ2960 

SEQ2950 
SEQ2951 
SEQ2952 
SEQ2953 
SEQ2954 
SEQ2955 
SEQ2956 
SEQ2957 
SEQ2958 
SEQ2959 
SEQ2960 

SEQ2950 
SEQ2951 
SEQ2952 
SEQ2953 
SEQ2954 
SEQ2955 
SEQ2956 
SEQ2957 
SEQ2958 
SEQ2959 
SEQ2960 



Table 29: Comptf 



1. 



c Sequences relating t SAG1641 (Ya< 



leCTai 



OB2602 
amily protein) 



DVDINAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDA 
DVDINAFQHYNFLENWNKENKKNLI PLEKTYLAPIRI YSEKVKSLKKLKKGATI AI PNDA 

TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
TNGSRALYVLQS AGLI KLNVSGKKVATVAN ITSNKKDIN IQELDASQT PRALKDVDAAI I 
TNGSRALWLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQT PRALKDVDAAI I 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
TNGSRALYVLQSAGLIKLNVSGKKVATVAN ITSNKKDIN I QEL DAS QT PRALKDVDAAI I 
TNGSRALYVLQSAGLIKLNVSGKKVATVAN ITSNKKDIN IQELDASQT PRALKDVDAAI I 

NNTYIEQANLKPSDAIFVEKSDKNSKQWINIIAGRKNWKKQKNAKAIQAILDAYHTDEVK 
NNTYIEQANLKP S DAI FVEKS DKNSKQW IN 1 I AGRKNWKKQKNAKAI QAI L DAYHT DEVK 
NNTYI EQANLKPS DAI FVEKS DKNSKQWIN I XAGRKNWKKQKNAKAIQAI LDAYHTDEVK 
NNTYIEQANLKPSDAIFVEKSDKNSKQWINIIAGRKNWKKQKNAKAIQAILDAYHTDEVK 
NNTYIEQANLKPS DAI FVEKS DKN SKQWIN 1 1 AGRKNWKKQKNAKAIQAILDAYHTDEVK 
NNTYIEQANLKPS DAI FVEKSDKNSKQW IN I IAGRKNWKKQKNAKAIQAILDAYHT DEVK 
NNTYIEQANLKPSDAIFVEKSDKNSKQWINII AGRKNWKKQKNAKAIQAILDAYHTDEVK 
NNTYIEQANLKPS DAI FVEKS DKNSKQW IN 1 1 AGRKNWKKQKNAKAIQAILDAYHTDEVK 
NNT Y IEQANLKPS DAI FVEKS DKNSKQW INI I AGRKNWKKQKNAKAI QAI LDAYHTDEVK 
NNTYIEQANLKPSDAI FVEKS DKN SKQWIN 1 1 AGRKNWKKQKNAKAI QAILDAYHTDEVK 
NNTYIEQANLKPSDAIFVEKSDKNSKQWINIIAGRKNWKKQKNAKAIQAIWDAYHTDEVK 

KVIKDTSADIPQWNPAFLY 

KVIKDTSADIPQW 

KVIKDTSADIP 

KVIKDTSADIPQW 

KVIKDTSADIPQW 

KVIKDTSADIPQW 

KVIKDTSADIPQW 

KVIKDTSADIPQW 

KVIKDTSADIPQW 

KVIKD 

KVIKDTSADIPQW 
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Comparative Sequences relating to SA(ST47 
(protein f uknown function / lipoprotein, putative) 



SEQ ID NO. 3001: SAG2147 FROM THE 1169NT1 GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

AAAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGCAGATAAAGTTCGCGTAGCC 

AAAAAATCAAAAATGACTAAGGCGACATCTAAATCAAAAGTAGAAGATGTAAAACAGGCT 

CCAAAACCTTCTCAGGCATCTAATGAAGTCCCAAAATCAAGTTCTCAATCTACAGAAGCT 

AATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCGGCTGTAGAACAAGCAGTTGTAACA 

GAAAATACCCCTGCTACCAGTCAGGCACAACAAACTTATGCTGTTACTGAGACAACTTAC 

AAACCTGCTCAACACCAGACAAGTGGCCAAGTATTGAGCAATGGAAATACTGCAGGGGCG 

GTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGG 

GAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCT 

TCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGATCAAGTT 

AATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTAC 

SEQ ID NO, 3002: SAG2147 FROM THE 18RS21 GBS TYPE II STRAIN 
(REVERSE COMPLEMENT) 

AAAAGTTCACAAGTT ACTACTGAATCTT TGTCAAAAGCAG ATAAAGT TC 

GCGTAGCCAAAAAATCAAAAATGACTAAGGCGACATCTAAATCAAAAGTAGAAGATGTAA 

AACAGGCTCCAAAACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCAATCTA 

CAGAAGCTAATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGTAGAACAAGCAG 

TTGTAACAGAAAACACCCCTGCTACCAGTCAGGCACAACAAGCTTATGCTGTTACTGAGA 

CAACTTATAGACCTGCTCAACACCAGACGAGTGGCCAAGTATTGAGTAATGGAAATACTG 

CAGGGGCTATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGT 

CTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCT 

CAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGG 

ATCAAGTTAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTAC 

SEQ ID NO. 3003: SAG2147 FROM THE 2603 V/R GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

AAAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGCAGATAAAGT 

TCGCGTAGCCAAAAAATCAAAAATGACTAAGGCGACATCTAAATCAAAAGTAGAAGATGT 

AAAACAGGCTCCAAAACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCAATC 

TACAGAAGCT AATTCTCAGCAACAAGTTAC TGCGAGTGAAGAGGCAGCT GTAGAACAAGC 

AGT TGTAACAGAAAACACCCCTGCTACCAGTCAGGCACAACAAGCTTATGCT GTTACTGA 

GACAACTTATAGACCTGCTCAACACCAGACGAGTGGCCAAGTATTGAGTAATGGAAATAC 

TGCAGGGGCTATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCA 

GTCTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGC 

CTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCA 

GGATCAAGTTAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTA 

C 



SEQ ID NO. 3004: SAG2147 FROM THE 090 GBS TYPE la STRAIN 
(REVERSE COMPLEMENT) 

TAGCCAAAA/^ATCAAAAATGATTAAGGCGACATCTAAATCAAAAGTAGAAGATGTAAAAC 
AGGCTCCAAAACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCAATCTACAG 
AAGCTAATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGTAGAACAAGCAGTTG 
TAACAGAAAACACCCCTGCTACCAGTCAGGCACAACAAGCTTATGCTGTTACTGAGACAA 
CTTATAGACCTGCTCAACACCAGACGAGTGGCCAAGTATTGAGTAATGGAAATACTGCAG 
GGGCTATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTA 
CTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAG 
GAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGA 



SEQ ID NO. 3005: SAG2147 FROM THE A909 GBS TYPE la STRAIN 
(REVERSE COMPLEMENT) 

AAGGCGACATCTAAATCAAAAGTAGAAGATGTAAAACAGGCTCCAAAACCTTCTCAGGCA 
TCTAATGAAGCCCCAAAATCAAGTTCTCAATCTACAGAAGCTAATTCTCAGCAACAAGTT 
ACTGCGAGTGAAGAGGCAGCTGTAGAACAAGCAGTTGTAACAGAAAACACCCCTGCTACC 
AGTCAGGCACAACAAGCTTATGCTGTTACTGAGACAACTTATAGACCTGCTCAACACCAG 
ACAAGTGGCCAAGTATTGAGTAATGGAAATACTGCAGGGGCTATTGGCTCAGCAGCTGCA 
GCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGT 
GAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACG 
ATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGAATCAAGTTAATTCAGCTATTAAAGCT 
TATCGTGCTCAAGGTTTATCA 
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Comparative Sequences relating t 
(pr tein of uknown functi n / lipoprotein, putative) 

SEQ ID NO. 3006: SAG2I47 FROM THE CJB110 GBS NONTYPEABLE STRAIN 
(REVERSE COMPLEMENT) 

AATCTTTGTCAAAAGCAGATAAAGTTCGCGTAGCCAAAAAATCAAAAATGACTAAGGCGA 
C ATCT AAATCAAAAGTAGAAGATGTAAAACAGGCTCCAAAACCT TCTCAGGCATCTAAT G 
AAGCCCCAAAATCAAGTTCTCAATCTACAGAAGCTAATTCTCAGCAACAAGTTACTGCGA 
GTGAAGAGGCAGCTGTAGAACAAGCAGTTGTAACAGAAAACACCCCTGCTACCAGTCAGG 
CACAACAAGCTTATGCTGTTACTGAGACAACTTATAGACCTGCTCAACACCAGACGAGTG 
GCCAAGTATTGAGTAATGGAAATACTGCAGGGGCTATTGGCTCAGCAGCTGCAGCACAAA 
TGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAA 
ATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAG 
GTTGGGGTTCAACAGCTACAGTTCAGGATCAAGTTAATTCAGCTATTAAAGCTTATCGTG 
CTCAAGGTTTATCAGCTTGGGGTTAC 



SEQ ID NO, 3007: SAG2147 FROM THE COHl GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

AAAAGTT CACAAGT TACTACTGAATCTTTGTCAAAAGCAGATAA 

AGTTCGCGTAGCCAAAAAATCAAATVATGACTAAGGCGACATCTAAATCAAAAGTAGAAGA 
TGTAAAACAGGCTCCAAAACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCA 
ATCTACAGAAGCTAATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCGGCTGTAGAACA 
AGCAGTTGTAACAGAAAATACCCCTGCTACCAGTCAGGCACAACAAACTTATGCTGTTAC 
TGAGACAACTTACAAACCTGCTCAACACCAGACAAGTGGCCAAGTATTGAGCAATGGAAA 
TACTGCAGGGGCGGTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCC 
TCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAA 
TGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGT 
TCAGGATCAAGTTAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGG 
TTAC 



SEQ ID NO. 3008: SAG2147 FROM THE H36b GBS TYPE lb STRAIN 
(REVERSE COMPLEMENT) 

AAAAGTTCAC AAGTTACTACTGAATCTTT GTCAAAAGC 

AGATAAAGTTCGCGTAGCCAAAAAATCAAAAATGACTAAGGCGACATCTAAATCAAAAGT 
AGAAGATGTAAAACAGGCTCCAAAACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAG 
TTCTCAATCTACAGAAGCTAATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGT 
AGAACAAGCAGTTGTAACAGAAAACACCCCTGCTACCAGTCAGGCACAACAAGCTTATGC 
TGTTACTGAGACAACTTATAGACCTGCTCAACACCAGACAAGTGGCCAAGTATTGAGTAA 
TGGAAATACTGCAGGGGCTATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGG 
AGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGT 
TGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGC 
TACAGTTCAGGATCAAGTTAATTCAGCTATTAAAGCTT 

SEQ ID NO. 3009: SAG2147 FROM THE M732 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

AAAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGCAGATAAAGTTCGCGTAGC 

CAAAAAATCAAAAATGACTAAGGCGACATCTAAATCAAAAGTAGAAGATGTAAAACAGGC 

TCCAAAACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCAATCTACAGAAGC 

TAATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCGGCTGTAGAACAAGCAGTTGTAAC 

AGAAAATACCCCTGCTACCAGTCAGGCACAACAAACTTATGCTGTTACTGAGACAACTTA 

CAAACCTGCTCAACACCAGACAAGTGGCCAAGTATTGAGCAATGGAAATACTGCAGGGGC 

GGTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTG 

GGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAGC 

TTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGATCAAGT 

TAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTA 
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Comparativ Sequences relating t SA^14 



Table^T: Comparativ Sequences relating t SAT55l47 
(protein of uknown function / lipoprotein, putative) 



SEQ ID NO. 30X0: SA62147 FROM THE M781 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

GTAACCCCAAGCTGATAAACCTTGAGCACGATAAGCTTTAATAGCTGAATTAACTTGATC 
CTGAACTGTAGCTGTTGAACCCCAACCTGGCATCGTTTGGAAAAGTCCTGAAGCTCCTGA 
GGCATTAG CAACATT AGGATTACCATTTGAT TCACGGGCAATAATATGT TCCC AAGTAGA 
CTGAGGGACTCCTGTTGCAGCAGCCATTTGTGCTGCAGCAGCAGATCCGACCGCCCCTGC 
AGTATTTCCATTGCTCAATACTTGGCCACTTGTCTGGTGTTGAGCAGGTTTGTAAGTTGT 
CTCAGTAACAGCATAAGTTTGTTGTGCCTGACTGGTAGCAGGGGTATTTTCTGTTACAAC 
TGCTTGTTCTACAGCCGCCTCTTCACTCGCAGTAACTTGTTGCTGAGAATTAGCTTCTGT 
AGATTGAGAACTTGATTTTGGGGCTTCATTAGATGCCTGAGAAGGTTTTGGAGCCTGTTT 
TACATCTTCTACTTTTGATTTAGATGTCGCCTTAGTCATTTTTGATTTTTTGGCTACGCG 
AACTTTATCTGCTTTTGACAAAGA 

SEQ3002 

SEQ3003 

SEQ3004 

SEQ3005 AGGCGACATCTAAATCAAAAGTAGAAGATGTAAAACAGGCTCCAAAACCTTCTCAGGCA 

SEQ3007 

SEQ3008 

SEQ3009 

SEQ3010 

SEQ3001 

SEQ3002 

SEQ3003 

SEQ3004 

SEQ3005 CTAATGAAGCCCCAAAATCAAGTTCTCAATCTACAGAAGCTAATTCTCAGCAACAAGTT 

SEQ3007 

SEQ3008 

SEQ3009 , 

SEQ3010 

SEQ3001 

SEQ3002 

SEQ3003 

SEQ3004 

SEQ30 05 CTGCGAGTGAAGAGGCAGCTGTAGAACAAGCAGTTGTAACAGAAAACACCCCTGCTACC 

SEQ3007 

SEQ3008 

SEQ3009 

SEQ3010 

SEQ3001 

SEQ3002 

SEQ3003 

SEQ3004 

SEQ3005 GTCAGGCACAACAAGCTTATGCTGTTACTGAGACAACTTATAGACCTGCTCAACACCAG 

SEQ3O07 

SEQ3008 

SEQ3009 

3EQ3010 

SEQ3001 

SEQ3002 

SEQ3003 

SEQ3004 

SEQ3005 CAAGTGGCCAAGTATTGAGTAATGGAAATACTGCAGGGGCTATTGGCTCAGCAGCTGCA 

SEQ3007 

SEQ3008 

SEQ3009 

SEQ3010 
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.082602 



Table 30: C mparative Sequences relating t SAG2147 
(protein of uknown function / lipopr tein, putative) 



SEQ3001 
SEQ3002 
SEQ3003 
SEQ3004 
SEQ3005 
SEQ3007 
SEQ3008 
SEQ3009 
SEQ3010 

SEQ300X 

SEQ3002 

SEQ3003 • 

SEQ3004 

SEQ3005 

SEQ3007 

SEQ3008 

SEQ3009 

SEQ3010 

SEQ3001 
SEQ3002 
SEQ3003 
SEQ3004 
SEQ3005 
SEQ3007 
SEQ3008 
SEQ3009 
SEQ3010 

SEQ3001 
SEQ3002 
SEQ3003 
SEQ3004 
SEQ3005 
SEQ3007 
SEQ3008 
SEQ3009 
SEQ3010 

SEQ3001 
SEQ3002 
SEQ3003 
SEQ3004 
SEQ3005 
SEQ3007 
SEQ3008 
SEQ3009 
SEQ3010 

SEQ3001 
SEQ3002 
SEQ3003 
SEQ3004 
SEQ3005 
SEQ3007 
SEQ3008 
SEQ3009 
SEQ3010 



CACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGT 



AATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACG 



TGCCAGGTTGGGGTTCAACAGCTACAGTTCAGAATCAAGTTAATTCAGCTATTAJ^AGCT 



AAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGCAGATAAAGTTCGCGTAGCCAAA 
AAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGCAGATAAAGTTCGCGTAGCCAAA 
AAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGCAGATAAAGTTCGCGTAGCCAAA 

TAGCCAAA 

ATCGTGCTCAAGGTTTATCASAATCTTTGTCAAAAGCAGATAAAGTTCGCGTAGCCAAA 
AAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGCAGATAAAGTTCGCGTAGCCAAA 
AAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGCAGATAT^AGTTCGCGTAGCCAAA 
AAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGCAGATAAAGTTCGCGTAGCCAAA 
GTAACCCCAAGCTGA TAAACCTTGAGCACGATAAGCTTTAATAGCTGAA 

AATC7VAAAATGACTAAGGCGACATCTAAATCAAAAGTAGAAGATGTAAAACAGGCTCCA 
AATCAAAAATGACTAAGGCGACATCTT^AATCAAAAGTAGAAGATGTAAAACAGGCTCCA 
AATCAAAAATGACTAAGGCGACATCTAAATCAAAAGTAGAAGATGTAAAACAGGCTCCA 
AATCAAAAATGATTAAGGCGACATCTAAATCAAAAGTAGAAGATGTAAAACAGGCTCCA 
AATCAAAAATGACTAAGGCGACATCTAAATCAAAAGTAGAAGATGTAAAACAGGCTCCA 
AATCAAAAATGACTAAGGCGACATCTAAATCAAAAGTAGAAGATGTAAAACAGGCTCCA 
AATCAAAAATGACTAAGGCGACATCTAAATCAAAAGTAGAAGATGTAAAACAGGCTCCA 
AATCAAAAATGACTAAGGCGACATCTAAATCAAAAGTAGAAGATGTAAAACAGGCTCCA 
TAACTTGATCCTGAACTGTAGCTGTTGAACCCCAACCTGGCATCGTTTGGAAAAGTCCT 

AACCTTCTCAGGCATCTAATGAAGTCCCAAAATCAAGTTCTCAATCTACAGAAGCTAAT 
AACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCAATCTACAGAAGCTAAT 
AACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCAATCTACAGAAGCTAAT 
MCCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCAATCTACAGAAGCTAAT 
AACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCAATCTACAGAAGCTAAT 
AACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCAATCTACAGAAGCTAAT 
AACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCAATCTACAGAAGCTAAT 
AACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCAATCTACAGAAGCTAAT 
AAGCTCCTGAGGCATT AGCAACATTAGGATTAC-CATTTGATTCACGGGCAATAAT 



SEQ3001 TCTCAGCAACAAGTTACTGCGAGTGAAGAGGCGGCTGTAGAACAAGCAGTTGTAACAGA 

SEQ3002 TCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGTAGAACAAGCAGTTGTAACAGA 

SEQ3003 TCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGTAGAACAAGCAGTTGTAACAGA 

SEQ3004 TCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGTAGAACAAGCAGTTGTAACAGA 

SEQ3005 TCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGTAGAACAAGCAGTTGTAACAGA 

SEQ3007 TCTCAGCAACAAGTTACTGCGAGTGAAGAGGCGGCTGTAGAACAAGCAGTTGTAACAGA 

SEQ3008 TCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGTAGAACAAGCAGTTGTAACAGA 

SEQ3009 TCTCAGCAACAAGTTACTGCGAGTGAAGAGGCGGCTGTAGAACAAGCAGTTGTAACAGA 

SEQ3010 TGTTCCCAAGTAGACTGAGGGACTCCTGTTGCAGCAGCCATTTGTGCTGCAGCAGCAGA 
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SEQ3001 
SEQ3002 
SEQ3003 
SEQ3004 
SEQ3005 
SEQ3007 
SEQ3008 
SEQ3009 
SEQ3010 

SEQ300L 
SEQ3002 
SEQ3003 
SEQ3004 
SEQ3005 
SEQ3007 
SEQ3008 
SEQ3009 
SEQ3010 

SEQ3001 
SEQ3002 
SEQ3003 
SEQ3004 
SEQ3005 
SEQ3007 
SEQ3008 
SEQ3009 
SEQ3010 

SBQ3001 
SEQ3002 
SEQ3003 
SEQ3004 
SEQ3005 
SEQ3007 
SEQ3008 
SEQ3009 
SBQ3010 

SEQ3001 
SEQ3002 
SEQ3003 
SEQ3004 
SEQ3005 
SEQ3007 
SEQ3008 
SEQ3009 
SEQ3010 

SEQ3001 
SEQ3002 
SEQ3003 
SEQ3004 
SEQ3005 
SEQ3007 
SEQ3008 
SEQ3009 
SEQ3010 



TablSW: Comparative Sequences relating t S>CC2l47 
(protein of uknown function / lipoprotein, putative) 

--AAATACCCCTGCTACCAGTCAGGCACAACAAACTTATGCTGTTACTGAGACAACTTA 
— AAACACCCCTGCTACCAGTCAGGCACAACAAGCTTATGCTGTTACTGAGACAACTTA 
— AAACACCCCTGCTACCAGTCAGGCACAACAAGCTTATGCTGTTACTGAGACAACTTA 
— AAACACCCCTGCTACCAGTCAGGCACAACAAGCTTATGCTGTTACTGAGACAACTTA 
— AAACACCCCTGCTACCAGTCAGGCACAACAAGCTTATGCTGTTACTGAGACAACTTA 
—A^TACCCCTGCTACCAGTCAGGCACAACAAACTTATGCTGTTACTGAGACAACTTA 
—AAACACCCCTGCTACCAGTCAGGCACAACAAGCTTATGCTGTTACTGAGACAACTTA 
— AAATACCCCTGCTACCAGTCAGGCACAACAAACTTATGCTGTTACTGAGACAACTTA 
CCGACCGCCCCTGCAGTATTTCCATTGCTCAATACTTG-GCCACTTGTCTGGTGTTGAG 

AAACCTGCTCAACACCAGACAAGTGGC-CAAGTATTGAGCAATGGAAATACTGCAGGGG 
AGACCTGCTCAACACCAGACGAGTGGC— CAAGTATTGAGTAATGGAAATACTGCAGGGG 
AGACCTGCTCAACACCAGACGAGTGGC-CAAGTATTGAGTMTGGAAATACTGCAGGGG 
AGACCTGCTCAACACCAGACGAGTGGC-CAAGTATTGAGTAATGGAAATACTGCAGGGG 
AGACCTGCTCAACACCAGACGAGTGGC-CAAGTATTGAGTAATGGAAATACTGCAGGGG 
AAACCTGCTCAACACCAGACAAGTGGC-CAAGTATTGAGCAATGGAAATACTGCAGGGG 
AGACCTGCTCAACACCAGACAAGTGGC-CAAGTATTGAGTAATGGAAATACTGCAGGGG 
AAACCTGCTCAACACCAGACAAGTGGC-CAAGTATTGAGCAATGGAAATACTGCAGGGG 
AGGTTTGTAAGTTGTCTCAGTAACAGCATAAGTTTGTTGTGCCTGACTGGTAGCAGGGG 

GGTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTT 
TATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTT 
TATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTT 
TATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTT 
TATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTT 
GGTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTT 
TATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTT 
GGTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTT 
A-TTT — TCTGTTACAACTGCTTGTTCTACAGCCGCCTCTTCACTCGCAGTAACTTGTT 

GGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAG 
GGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAG 
GGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAG 
GGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAG 
GGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAG 
GGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAG 
GGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAG 
GGG7UVCATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAG 
GCTGAGA-ATTAGCTTCTGTAGATTGAG AA — CTTGATTTTGGGGCTTCATTAGATG 

CTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGATCAAG 
CTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGATCAAG 
CTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGATCAAG 

CTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGA 

CTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGATCAAG 
CTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGATCAAG 
CTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGATCAAG 
CTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGATC/VAG 
CCTGAGAAGGTTTT GGAGCCTGTTTTACATCTTCTACTTTTGATTTAGATGTCGC 

TAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTAC — 
TAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTAC — 
TAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTAC — 



TAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTAC — 
TAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTAC — 
TAATTCAGCTATTAAAGCTT 

TAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTA 

TTAGTCA-TTTTTGATTTTTTGGCTACGCGAACTTTATCTGCTTTTGACAAAGA 
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Comparative Sequences relating t S^!^5l47 
(protein f u known functi n / lipoprotein, putative) 

>SEQ ID NO 3050: 25JL169NT frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEVPKSSSQSTEAN 
SQQQVTASEEAAVEQAVVTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

>SEQ ID NO 3051:25_18RS21 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAVVTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
GSAAAA^IAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

>SEQ ID NO 3052:25_2603 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVE DVKQAPKPSQASNEAPKSS SQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
GSAAAAQMAAATGVPQSTWEHI IARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN * 
SAIKAYRAQGLSAWGY 

>SEQ ID NO 3053:25_090 frame: 3 

AKKSKMIKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAVV 
TENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQMAAATGVPQST 
WEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQ 

>SEQ ID NO 3054:25_A909 frame: 1 

KATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAVVTENTPAT 
SQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQMAAATGVPQSTWEHIIAR 
ESNGNPNVANASGASGLFQTMPGWGSTATVQNQVNSAIKAYRAQGLS 

>SEQ ID NO 3055:25_CJB110 frame: 3 

SLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTAS 
EEAAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQM 
AAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRA 
QGLSAWGY 

>SEQ ID NO 3056:25_COH1 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

>SEQ ID NO 3057:25_H36B frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAVVTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKA 

>SEQ ID NO 3058:25_M732 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVE DVKQAPKPSQASNEAPKSS SQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWG 

>SEQ ID NO 3059:25_M781 frame: 4 

SLSKADK\mVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTAS 
EEAAVEQAVVTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAVGSAAAAQM 
AAATGVPQSTWEHI I ARE SNGN PN VANASGASGLFQTMPGWGSTATVQDQVN SAIKAYRA 
QGLSAWGY 

SEQ3050 SSQVTTESLSKADKVRVAKKSKMTKAT SKSKVE DVKQAPKPSQASNEVPKSSSQSTEAN 

SEQ3051 SSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSS SQSTEAN 

SEQ3052 SSQVTTESLSKADKVRVAKKSKMTKATSKSKVE DVKQAPKPSQASNEAPKSS SQSTEAN 

SEQ3053 AKKSKMIKATSKSKVE DVKQAPKPSQASNEAPKSS SQSTEAN 

SEQ3054 KATSKSKVEDVKQAPKPSQASNEAPKSS SQSTEAN 

SEQ3055 SLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSS SQSTEAN 

SEQ3056 SSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSS SQSTEAN 
SEQ3057 S SQVTTE SLSKADKVRVAKKSKMTKAT SKSKVE D VKQAPKP S QASNE APKS S S QSTE AN 
SEQ3058 SSQVTTE SLSKADKVRVAKKSKMTKAT SKSKVE DVKQAPKPSQASNEAPKSS SQSTEAN 
SEQ3059 SLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 



£7 ,OSS&€K£ 
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SEQ3050 
SEQ3051 
SEQ3052 
SEQ3053 
SEQ3054 
SEQ3055 
SEQ3056 
SEQ3057 
SEQ3058 
SEQ3059 

SEQ3050 
SEQ3051 
SEQ3052 
SEQ3053 
SEQ3054 
SEQ3055 
SEQ3056 
SEQ3057 
SEQ3058 
SEQ3059 

SEQ3050 
SEQ3051 
SEQ3052 
SEQ3053 
SEQ3054 
SEQ3055 
SEQ3056 
SEQ3057 
SEQ3058 
SEQ3059 



Table 3Tn Comparative Sequences relating to SAOHR7 
(protein of uknown function / lip protein, putative) 

SQQQVTASEEAAVEQAVVTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
SQQQVTASEEAAVEQAVVTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
SQQQVTAS EEAAVEQAWTENT PATSQAQQAYAVTETT YRPAQHQTSGQVLSNGNTAGAI 
SQQQVTASEEAAVEQAVVTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
SQQQVTASEEAAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
SQQQVTASEEAAVEQAVVTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
SQQQVTASEEAAVEQAVVTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
SQQQVTASEEAAVEQAWTENT PATSQAQQAYAVTETT YRPAQHQT SGQVLSNGNTAGAI 
SQQQVTASEEAAVEQAWTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
SQQQVTASEEAAVEQAWTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 

GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 

GSAAAAQMAAATGVPQSTWEH I IARE SNGNPNV^NASGASGLFQTMPGWGSTATVQ 

GSAAAAQMAAAT GV PQSTWEHI IARESNGN PNVANASGASGLFQTMPGWGSTATVQNQVN 
GSAAAAQMAAATGV PQSTWEH 1 IARESNGN PNVANASGASGLFQTMPGWGSTATVQDQVN 
GSAAAAQMAAATGVPQSTWEHI IARESNGN PNVANASGASGLFQTMPGWGSTATVQDQVN 
GSAAAAQMAAATGVPQSTWEHI IARE SNGN PNVANASGASGLFQTMPGWGSTATVQDQVN 
GSAAAAQMAAATGVPQSTWEHI IARESNGN PNVANASGASGLFQTMPGWGSTATVQDQVN 
GSAAAAQMAAATGVPQSTWEHI IARESNGN PNVANASGASGLFQTMPGWGSTATVQDQVN 

AIKAYRAQGLSAWGY 
AIKAYRAQGLSAWGY 
AIKAYRAQGLSAWGY 

AIKAYRAQGLS 

AIKAYRAQGLSAWGY 
AIKAYRAQGLSAWGY 

AIKA 

AI KAYRAQGLS AWG- 
AIKAYRAQGLSAWGY 
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Table 




SEQ ID NO. 3101: SAG2148 FROM THE 1169NT1 GBS TYPE V STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTG 
TCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAA 
GCAGAAGCAAAATCTCAACCAACAATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCTGCA 
AAAGAAGAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTG 
• TCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTAC 
GGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACGGCTGGTAT 

SEQ ID NO. 3102: SAG2148 FROM THE 18RS21 GBS TYPE II STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTG 
TCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAA 
GCAGAAGCAAAATCTCAACCAACAATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCCGCA 
AAAGAAGAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTG 
TCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGTTTCTCGTTAC 
GGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACGGCTGGTAT 

SEQ ID NO. 3103: SAG2148 FROM THE 2603 V/R GBS TYPE V STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTG 
TCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGAT7\ATTCTACAGCTAGTCAA 
GC^GAAGCT^AAATCTCAACCAACAATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCCGCA 
AAAGAAGAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTG 
TCT.CAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGTTTCTCGTTAC 
(5GATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACGGCTGGTAT 

SEQ ID NO. 3104: SAG2148 FROM THE 090 GBS TYPE la STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTG 
TCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTAAAGCTAGTCAA 
GCAGAAGCAAAATCTCAACCAACAATTGAAAATTCAATGAATTCTT 

AAAGAAGAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTG 
TCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGTTTCTCGTTAC 
GGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACGGCTGGTAT 

SEQ ID NO. 3105: SAG2148 FROM THE A909 GBS TYPE la STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTG 
TCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAA 
GCAGAAGCAAAATCTCAACCAACAATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCCGCA 
AAAGAAGAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTG 
TCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTAC 
GGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACGGCTGGTAT 

SEQ ID NO. 3106: SAG2148 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

GCATCTTATACCGTGAAAT CAGGTGAT ACCTTATCAGCT AT TGCTAAAAATCAT AAAACTACGGTACAAGAGTTAGT G 
TCTCTCT^ATAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTAAAGCTAGTCAA 
GGAGAAGCAAAATCTCAACCAACAATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCCGCA 
AAAGAAGAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTG 
TCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGTTTCTCGTTAC 
GGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACGGCTGGTAT 

SEQ ID NO. 3107: SAG2148 FROM THE COH1 GBS TYPE III STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCAT7VAAACTACGGTACAATAGTTAGTG 
TCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAA 
GCAGAAGCAAAATCTCAACCAACAATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCTGCA 
AAAGAAGAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTG 
TCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTAC 
GGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACGGCTGGTAT 
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Comparative Sequences relating to SAgBl48 
(LysMd main pr tein) 

SEQ ID NO, 3108: SAG214B FROM THE H36b GBS TYPE lb STRAIN 
(REVERSE COMPLEMENT) 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTG 
TCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAA 
GCAGAAGCAAAATCTCAACCAACAATTGAAAATTCAATGAATTCTTCATCT^AATTTGAGTTCAAGTGATTCAGCCGCA 
AAAGAAGAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTG 
TCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTAC 
GGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACGGCTGGTAT 

SEQ ID NO. 3109: SAG2148 FROM THE JM9130013 GBS TYPE VIII STRAIN 
(REVERSE COMPLEMENT) 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAA7\ACTACGGTACAAGAGTTAGTG 
TCTCTCAATAGTATCAGTAACGCTGACGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAACTAGTCAA 
GCAGAAGCAAAATCTCAACCAACAATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCCGCA 
AAAGAAGAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTG 
TCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTAC 
GGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACGGCTGGTAT 

SEQ ID NO. 3110: SAG2148 FROM THE M732 GBS TYPE III STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCT TAT CAGCT ATTGCTAAAAATCATAAAACTACGGTACAATAGT T AGTG 
TCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAA 
GCAGAAGCAAAATCTCAACCAACAAT TGAAAATTCAATGAAT TCTTCATCAAATTTGAGT T CAAGTG ATTCAGCTGCA 
AAAGAAGAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTG 
TCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTAC 
GGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACGGCTGGTAT 

SEQ ID NO. 3111: SAG2148 FROM THE M781 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

GCATCTTATACCGTG7\AATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAATAGTTAGTG 
TCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAA 
GCAGAAGCAAAATCTCAACCAACAATTGAAAATTCAATG71ATTCTTCATCAAAT TT GAGTTCAAGTGATTCAGCTGC A 
AAAGAAGAAATAGCTCGTCGTGAATCA7UVTGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTG 
TCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTAC 
GGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACGGCTGGTAT 

SEQ3101 GCATCTTATACCGTGAAATCAGGTGATACCTTATC^GCTATTGCTAAAAATCATAAAACT 
SEQ3102 GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACT 
SEQ3103 GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACT 
SEQ3104 GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACT 
SEQ3105 GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACT 
SEQ3106 GC^TCTTATACCGTGAT^ATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACT 
SEQ3107 GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACT 
SEQ3108 GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACT 
SEQ3109 GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACT 
SEQ3110 GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACT 
SEQ3111 GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACT 

SEQ3101 ACGGTACAAGAGTTAGTGTCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGT 

SEQ3102 ACGGTACAAGAGTTAGTGTCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGT 

SEQ3103 ACGGTACAAGAGTTAGTGTCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGT 

SEQ3104 ACGGTACAAGAGTTAGTGTCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGT 

SEQ3105 ACGGTACAAGAGTTAGTGTCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGT 

SEQ3106 ACGGTACAAGAGTTAGTGTCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGT 

SEQ3107 ACGGTACAATAGTTAGTGTCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGT 

SEQ3108 ACGGTACAAGAGTTAGTGTCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGT 

SEQ3109 ACGGTACAAGAGTTAGTGTCTCTCAATAGTATCAGTAACGCTGACGTCATCAGTATAGGT 

SEQ3110 ACGGTACAATAGTTAGTGTCTCTCAATAGTATCAGTAACGCTGATGTCATCAGTATAGGT 

SEQ31 11 ACGGTACAATAGTTAGTGTCTCTCAATAGTATC AGTAACGCTGATGTCATCAGTATAGGT 
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GATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTCAACCAACA 
GATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTCAACCAACA 
GATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTCAACCAACA 
GATGTTTTAAAATTGGATAATTCTAAAGCTAGTCAAGCAGAAGCAAAATCTCAACCAACA 
GATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTCAACCAACA 
GATGTTTTAAAATTGGATAATTCTAAAGCTAGTCAAGCAGAAGCAAAATCTCAACCAACA 
GATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAl^AATCTCAACCAACA 
GATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTCAACCAACA 
GATGTTTTAAAATTGGATAATTCTACAACTAGTCAAGCAGAAGCAAAATCTCAACCAACA 
GATGTTTTAAAATTGGATAATTCTACAGCTTAGTCAAGCAGAAGCZ^AAATCTCAACCAACA 
GATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTCAACCAACA 

ATTG7U\AATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCTGCAAAAGAA 
ATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCCGCAAAAGAA 
ATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCCGCAAAAGAA 
ATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCCGCAAAAGAA 
ATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCCGCAAAAGAA 
ATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCCGCAAAAGAA 
ATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCTGCAAAAGAA 
ATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCCGCAAAAGAA 
ATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCCGCAAAAGAA 
ATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCTGCAAAAGAA 
ATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCTGCAAAAGAA 

GAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGA 
GAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGA 
GAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGA 
GAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGA 
GAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGA 
GAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGA 
GAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGA 
GAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGA 
GAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGA 
GAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAG7\ATGGACAATATTATGGA 
GAAATAGCTCGTCGTGAATCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGA 

AGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAA 
AGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAA 
AGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAA 
AGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAA 
AGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAA 
AGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAA 
AGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAA 
AGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAA 
AGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAA 
AGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAA 
AGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCCTGAAAATCAAGAAAAA 

GTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGG 
GTAGCGGACAATTATGTGGTTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGG 
GTAGCGGACAATTATGTGGTTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGG 
GTAGCGGACAATTATGTGGTTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGG 
GTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCA'i'TTTGG 
GTAGCGGACAATTATGTGGTTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGG 
GTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGG 
GTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGG 
GTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGG 
GTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGG 
GTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGG 
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SEQ3101 
SEQ3102 
SEQ3103 
SEQ3104 
SEQ3105 
SEQ3106 
SEQ3107 
SEQ3108 
SEQ3109 
SEQ3U0 
SEQ3111 



AATAGTAACGGCTGGTAT 
AATAGTAACGGCTGGTAT 
AATAGTAACGGCTGGTAT 
AATAGTAACGGCTGGTAT 
AATAGTAACGGCTGGTAT 
AATAGTAACGGCTGGTAT 
AATAGTAACGGCTGGTAT 
AATAGTAACGGCTGGfAT 
AATAGTAACGGCTGGTAT 
AATAGTAACGGCTGGTAT 
AATAGTAACGGCTGGTAT 



>SEQ ID NO 3150 : 15_1169NT frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYVASRYGSWSAALSFWNSNGWY 

>SEQ ID NO 3151:15_18RS21 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLECLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK' 
VADNYWSRYG SWSAALS FWNSNGWY 

>SEQ ID NO 3152 :15_2 603 frame: 1 

AS YTVKS GDTLSAIAKNHKTTVQELVSLNS I SNADVI S IG DVLKLDN STASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYWSRYGSWSAALSFWNSNGWY 

>SEO ID NO 3153:15_090 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNS I SNADVI S I G DVLKLDN S KAS QAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYWSRYGSWSAALSFWNSNGWY 

>SEQ ID NO 3154:15_A909 frame: 1 

ASYTVKSGDTLSAI AKNHKTTVQELVSLN S I SNADVI S IG DVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYVASRYGSWSAALSFWNSNGWY 

>SEQ ID NO 3155:15JCJB110 frame: 1 

AS YTVKSGDTLSAIAKNHKTTVQELVSLNSI SNADVI SIG DVLKLDN SKASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYWSRYGSWSAALSFWNSNGWY 

>SEQ ID NO 3156:15jCOHl frame: 1 

AS YTVKSGDTLSAIAKNHKTTVQ . LVSLN SI SNADVI SIG DVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYVASRYGSWSAALSFWNSNGWY 

>SEQ ID NO 3157:15_H36B frame: 1 

AS YTVKSGDTLSAIAKNHKTTVQELVSLN S I SNADVI S IGDVLKLDN STASQAEAKSQPT 
IENSMNS SSNLSS SDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLS PENQEK 
VADNYVASRYGSWSAALSFWNSNGWY 

>SEQ ID NO 3158:15_JM9130013 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLN SI SNADVI SIGDVLKLDNSTTSQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYVASRYGSWSAALSFWNSNGWY 

>SEQ ID NO 3159:1S_M732 frame: 1 

AS YTVKSGDTLSAIAKNHKTTVQ . LVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEE I ARRESNGSYTAQNGQYYGRYQLSQSYLNGDLS PENQEK 
VADN YVAS RYGS WS AALS FWNSNGWY 

>SEQ ID NO 3160:15_M781 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQ . LVSLN SI SNADVI SIG DVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYVASRYGSWSAALSFWNSNGWY 
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SEQ3151 
SEQ3152 
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SEQ3156 
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SEQ3158 
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SEQ3150 
SEQ3151 
SEQ3152 
SEQ3153 
SEQ3154 
SEQ3155 
SEQ3156 
SEQ3157 
SEQ3158 
SEQ3159 
SEQ3160 

SEQ3150 
SEQ3151 
SEQ3152 
SEQ3153 
SEQ3154 
SEQ3155 
SEQ3156 
SEQ3157 
SEQ3158 
SEQ3159 
SEQ3160 



Comparative Sequences relating to SA< 
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ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSKASQAEAKSQPT 
ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSKASQAEAKSQPT 
ASYTVKSGDTLSAIAKNHKTTVQ-LVSLNSISNADVISIGDVLKLDNSTASQAET^KSQPT 
ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
ASYTVKSGDTLSAIAiCNHKTTVQELVSLNSISNADVISIGDVLKLDNSTTSQAEAKSQPT 
ASYTVKSGDTLSAIAKNHKTTVQ-LVSLNS I SNADVISIGDVLKLDNSTASQAEAKSQPT 
ASYTVKSGDTLSAIAKNHKTTVQ-LVSLNS I SNADVIS IGDVLKLDNSTASQAEAKSQPT 

IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 

VADNYVASRYGSWSAALS FWNSNGWY 
VADNYWSRYGSWSAALSFWNSNGWY 
VADNYWSRYGSWSAALSFWNSNGWY 
VADNYWSRYGSWSAALSFWNSNGWY 
VADN YVASR YG S W S AAL S FWNSN GWY 
VADNYWSRYGSWSAALSFWNSNGWY 
VADNYVASRYGSWSAALS FWNSNGWY 
VADNYVASRYGSWSAALS FWNSNGWY 
VADNYVASRYGSWSAALS FWNSNGWY 
VADNYVASRYGSWSAALSFWNSNGWY 
VADNYVASRYGSW SAALS FWNSNGWY 
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Tabl 32: Conv rsionofORFR fN s. with SAGKef Nos. 



ADC Dftf kl 


OAbXXXX Ret NO. 


aa 


Annotation 


ORF00003 


SAG0017 


447 | 


PcsB protein 


ORF00004 


SAG001 8 


322 


ribose-phosphate pyrophosphokinase 


OKF00005 


SAG001 9 


391 


aminotransferase, class I 


OKrOOOOo 


£\ A A A A A 

SAG0020 


253 


recombination protein O 


UKrOOOOo 


SAG0021 


283 


protease, putative 


ORF0Q009 


SAG0022 


330 


fatty acid/phospholipid synthesis protein PIsX I 


ORF00010 


SAG0023 


79 


acy) carrier protein 


ORF00011 


SAG0024 


234 


phosphoribosylaminoimidazole-succinocarboxamide 
synthase 


ORF00012 


SAG0025 


1241 


phosphoribosylformylglycinamldine synthase, putative 


norAAAii «s 

ORF00013 


f\ A A AAA 

SAG0026 


•484 


amidophosphoribosyltransferase 


rtnCAAA* A 

ORF00014 


SAG0027 


340 


phosphoribosylformylglycinamidinecyclo-ligase 


ORF00015 


SAG0028 


182 


phosphoribosylglycinamide formyltransferase ! 


ORF00016 


SAG0029 


250 


acetyltransferase, GNAT family 


ORF00017 


SAG0030 


515 


phosphoribosylaminoimidazolecarboxamide 
formyltransferase/IMP cyclohydrolase 


UnrUUUlo 


oAbUUJl 


283 


peptidase, M23/M37 family 


VJKrUUUZu 


oAfc>Q032 


A*5 A 

434 


group B streptococcal surface immunogenic protein 


ORF00021 


SAG0033 


232 


N-acetylmannosamine-6-P epimerase, putative 


f\DCAAAOO 


5AG0034 


438 


sugar ABC transporter, sugar-binding protein 


i r\DCOAAOO 


SAG0035 


295 


sugar ABC transporter, permease protein 


ORF00024 


SAG0036 


276 


sugar ABC transporter, permease protein 


ORF00025 


SAG0037 


147 


conserved hypothetical protein 


ORF00026 


SAG0038 


220 


conserved hypothetical protein 


ORF00027 


SAG0039 


305 


N-acetylneuraminate lyase, putative 


ORF00028 


SAG0040 


293 


ROK family protein 


ORF00029 


SAG0041 


325 


acetyl xylan esterase, putative 


ORF00030 


SAG0042 


267 


phosphosugar-binding transcriptional regulator, RpiR 
family, putative 


ORF00031 


SAG0043 


421 


phosphoribosylamine-glycine ligase 


ORF00032 


SAG0044 


162 


phosphoribosylaminoimidazole carboxylase, catalytic 
subunit 


OKF0QQ33 


O A /—» a a j r- 

SAG0045 


363 


phosphoribosylaminoimidazole carboxylase, ATPase 
subunit 








hypothetical protein 


urvruuvoo 


OMV3UU*t f 


AM 


aaenyiosucctnate lyase 


ORFnnn^7 
\j rsruuuo # 




oUo 


transcriptional regulator, oro/oi tamiiy 


w r\r uuujo 


OMV3UU*H7 


ooZ 


Holliday junction DNA helicase RuvB 


ur\ruuuoa 






pnospnotyrosine protein pnospnatase, low molecular 


ORF00040 


SAG0051 


126 


MORNI motif familv nrnfpin 


ORF00041 


SAG0052 


592 


membrane nrotpin nutativp 


ORF00042 


SAG0053 


880 


aldphvdp-Aleohol riphvrlronpna^p 

aiuwi lyuc^oibui iui uci lyui uyci laoc 


ORF00043 


SAG0054 


338 


alcohol dphvdroopriri^p nrooanol- nrpfprrinn 


ORF00044 


SAG0055 


i 496 


threonine synthase 


ORF00045 


SAG0056 


412 


MATE efflux family protein 


ORF00046 


SAG0057 


102 


ribosomal protein S10 


ORF00047 


SAG0058 


208 


ribosomal protein L3 


i ORF00048 


SAG0059 


207 


ribosomal protein L4 


ORF00049 


SAG0060 


98 


ribosomal protein L23 


ORF00050 


SAG0061 


277 


ribosomal protein L2 


i ORF00052 


SAG0062 


! 92 


ribosomal protein S19 


ORF00054 


SAG0063 


114 


ribosomal protein L22 


ORF00055 


SAG0064 


217 


ribosomal protein S3 
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Table 32: C nv rsi n of ORF Ref Nos. with SAG Ref Nos. 



ORF Ref No 


SAGxxmr Ref No 


ad 


Annnfafi n 


ORF00056 


SAG0065 


137 


rihocomftl nrrtfoin 1 1fi 

IIUUoUMIal piUiwHl LIU 


ORF00058 

^mwm \9 WWWWW 


SAG0066 

w W WW 


68 


rihn<;nmal nrnt#»in I 99 
i iuuoui i icii yti uiui i [ i few 


ORF00059 


SAG0087 


86 


IIUUOUIIIhI fJ 1 w IC III w I • 


ORF00060 


SAG0068 


122 


Hhn^omal nmtoin 114 
i luuouii icai yji uicii i l 


ORF00061 


SAG0069 


101 


rihncnmfll nrntoin 1 94 
llUUOwIllctl piwlwlll UuZrT 


ORF00063 

>l WW w W w 


SAG0O7O 

tw w w # v/ 


180 i 


iiijuwv/iiioi pruicin La) 


ORF00064 


SAG0071 


61 


rihftCrtmal nmtain Q14 nnfotivfA 
IlUUauindl proicin O I*t, puiaiive 


ORF00065 

V^iml WWWWW 


SAG0072 

w>^\w » WW # 4b 


132 I 


rihncnmal nmiain QA 


ORF00066 

>• www 


SAG0073 


17ft 


rihncAmol nrniatn 1 f\ 

iiuuouiiiai proiwin i_o 


ORF00068 


SAG0074 


11ft 

I IO 


rthncAmal nmfoin 1 <1 D 

i luvbui 1 icai proicin l. i o 


ORF00069 


SAGnn7*5 i 

NJ/VV3VJ U 1 %J 


1fi4 


nuubomai proiein oD 


ORF00070 

X^iAi WWW f \J 


SAG007R 


WW 


rihrtc/ifnal nrntain 1 *?fl 

nousornai proiwin lou 


ORF00071 


SAG0O77 


14R 




ORF00072 




A r \A 
*tO*r 


preproiein iransiocase, oecY suounii 


ORF00073 


SAG0O7Q 




ciucriyiciic Kinase 


ORF0Q074 


SAGflOAn I 
\pr\wuvuu 


79 i 


iransiauon iniiiaxion lacior ir-i 


ORFOn07*5 




^A 


noosomai proiein Loo 


ORF00077 


SAGnOfl9 


191 


nDosomai proiein oio 


ORF00Q78 

VIM uUU f O 


onuuuoo 


1 1 A 
I IO 


riuosomai proiein oil 


ORF00080 


SAG0084 


312 


DNA-directed RNA polymerase, alpha subunit 


UKrUUUol 


oAGOOoo 


128 


ribosomal protein L17 


ORF00087 


SAG0086 


97 


hypothetical protein j 


ORF00088 


i SAG0087 


59 


hypothetical protein 


ORF00089 


j SAG0088 


56 


hypothetical protein 


^>r^i"nnArtn 

ORF00090 


SAG0089 


183 


conserved hypothetical protein 


ORF00091 


SAG0090 


139 


conserved hypothetical protein 


ORF00093 


SAG0091 


144 


transcriptional regulator ComX1 t putative 


ORF00094 


SAG0092 


230 


phosphoglycerate mutase family protein 


ORF00095 


SAG0093 


250 


D-alanyl-D-alanine carboxypeptidase family protein 


vjrsrUVJuyo 




191 


N-acetylmuramoyl-L-alanine amidase, family 4 protein 


ORF00097 




0*f*f 


ncai-inuuciDie iranscnpuon repressor nrcA 


ORF00098 

i »• www WW 




1QO 


iieai snocjv proiein urpc. 


ORF00099 


SAG0097 


609 


dnaK protein 


ORFnnmn 

UnrUU IUU 


oAbUUbo 


of y 


dnaJ protein 






*M 0 


transcriptional regulator, CpntR family 


ORFnm n9 


OnOU 1 UU 




tRNA pseudouridine synthase A 




OnUU I U I 




pnospnometnyipynmidine Kinase, putative 


ORF00104 


OnUU I SJdL 


i I O 1 * 


conserveo nypotneticai protein 


ORF0010S 
vrvruu iuj 


OnUU 1 UO 


1 AO 

lay 


conserved hypothetical protein 


ORF00106 


SAG0104 


280 


conserved hypothetical protein 


UKrUUlU/ 


SAG 01 05 


427 


trigger factor 


UKrUul Oo 


SAGOlOo 


191 


Hill A _lf J _ _* ■ A i • |. » • . a Mm 

DNA-directed RNA polymerase, delta subunit, putative 


ORF00109 


SAG0 107 


534 


OTP cunthaco 


ORF00110 


SAG0108 


308 


conserved hypothetical protein 


ORF00111 


SAG0109 


148 


deoxyuridine 5'-triphosphate nucleotidohydrolase 


ORF00112 


SAG0110 


454 


DNA repair protein RadA 


ORF00113 


SAG0111 


165 


carbonic anhydrase-related protein 


ORF00115 


SAG0112 


439 


pyridine nucleotide-disulphide oxidoreductase family 
protein 


ORF00116 


SAG0113 


484 


glutamyl-tRNA synthetase 


ORF00117 


SAG0114 


322 


ribose ABC transporter, p riplasmic L>ribose-binding 
protein 
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Table 32: Conversi n of ORF Ref N s. with SAG^cf Nos. 



ORFRefN . 


SAGxxxx Ref No. 


aa i 


tVnnotatl n 


ORF00118 


SAG0115 


310 


ribose ABC transporter, permease protein 


i ORF00119 


SAG0116 


492 


ribose ABC transporter, ATP-binding protein 


ORF00120 


SAG0117 


132 


ribose ABC transporter protein RbsD 


ORF00121 


SAG0118 


303 


ribokinase 


ORF00122 


SAG0119 


328 


ribose operon repressor RbsR 


ORF00123 


SAG0120 


32 | 


hypothetical protein 


ORF00124 


SAG0121 


362 


permease, putative 


ORF00125 


SAG0122 


228 


ABC transporter, ATP-binding protein 


ORF00126 


SAG0123 


223 


DNA-binding response regulator 


ORF00128 


SAG0124 


356 


sensor histidine kinase 


ORF00129 


SAG0125 


396 


argini no succinate synthase 


ORF00130 


SAG0126 


462 


argininosuccinate lyase 


ORF00131 


SAG0127 


293 


fructose-bisphosphate aldolase 


Ur\rUU 1 


OnOU 1 £X> 


ouo 


l.-^- ny u r oxy i socaproaiw ueny u rug c n case 


UnrUU I oo 


OnOU 1 4L«7 


Uc 


I lUUoUIIlcll UlUlclll LuLO 




OMV3U 1 OU 


191 


wonserv^u nypuinsuuoi proiein 


f^RPflftl **** 


OnOU I Ol 


0*fO 


l/mi\£ oornam protein 


Unrwi OO 


OMoUl Oa 




orrn □ornain/Dano t rarniiy protein 


UKruuio/ 


oAuUl oo 


Oo 


conserveo nypomeucai protein 


\Jr\r\J\J 1 OO 




OR 

yo 


nypotneticai protein 




O/MoUl OO 


OAR 


amino aciu mdu xransporcer, m i r*-uinaing proiein 




OAVjUI OO 


O JO 


amino aciu ado transporter, amino aciu-omuing 

nrntptn/nprmpaciP nrntfain 
jiuicii u |jciii icaac |j lutein 


ORF00143 


SAGQ1 37 

wmVJU 1 w f 


627 


conserved hvontnelical nrotein 


ORF00145 


SAG01 38 


279 


undecanrennl kinase outative 


ORF00146 


SAG0139 


251 


negative regulator of competence MecA, putative ! 


UnrUU l*fO 


OAV3VJ l*tU 


OOO 


giycosyi uansiercioe, group *r Tcimiiy proiein 




OnuU 1 *♦ 1 


occ 


nDO 11 dllopUl lei, r\ 1 n -Ull 1UII1U, pi Ulcli 1 


UnrUU 1 OU 


QArsrn ao 

OAUU \**£. 




conservea nypotneucai proiein 


ORF00151 


SAG0143 


410 


selenocysteine lyase 


LIKrUUlDZ 


oAbU 144 




Nrru family protein 


ORF00153 


SAG0145 


472 


conserved hypothetical protein 


ORF00154 


SAGO! 48 


395 


penicillin-binding protein 4, putative 


ORF00155 


SAG0147 


411 


D-alanyl-D-alanine carboxypeptidase 


ORF00156 


SAG0148 


551 


oligopeptide ABC transporter, substrate binding protein, 
putative 




QAfSni ^IQ 


OU** 


ougopepiiae moo transporter, permease protein 


UKrUUloo 


anuUl OU 


o4o 


ougopepiiae At$o transporter, permease proiein 


UrsrUU 1 ou 


OrVoUl Ol 


E OvIQ 

o4o 


ougopepiioe adu transporter, h i r-Dinaing protein 


ORF00161 


SAG0152 


310 


nlinnnpntide ABC trans do rter ATP-bindina nrotein 


ORF00166 


SAG0153 


283 


4-diphosphocytidyl-2C-methyl-D-erythritol kinase 


ORF00167 


SAG0154 


| 147 


adc operon repressor AdcR 


ORF00168 


SAG0155 


236 


zinc ABC transporter, ATP-binding protein 


ORF00169 


SAG0156 


270 


zinc ABC transporter, permease protein 


ORF00172 


SAG0158 


419 


tyrosyMRNA synthetase 


ORF00173 


SAG0159 


765 


penicillin-binding protein 1B, putative 


ORF00174 


SAG0160 


1191 


DNA-directed RNA polymerase, beta subunit 


ORF00176 


SAG0161 


| 1216 


DNA-directed RNA polymerase beta* subunit 


ORF00178 


SAG0162 


121 


conserved hypothetical protein 


ORF00179 


SAG0163 


323 


competence protein CgIA 


ORF00180 


SAG0164 


282 


competence protein CgIB 


ORF00181 


SAG0165 


151 


conserved hypothetical protein j 


ORF00182 


SAG0166 


123 


cons rved domain protein 
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nversion of ORF Ref Nos. with SA^W F Nos. 



ORFRefN . 


SAGxxxx Ref N . 


aa 


Annotati n 


ORF00183 


SAG0167 


324 


conserved hypothetical protein 


ORF00184 


SAG0168 


397 


acetate kinase 


ORF00186 


SAG0169 


68 


transcriptional regulator, Cro/CI family 


ORF00187 


SAG0170 


45 


hypothetical protein 


ORF00188 


SAG0171 


151 


hypothetical protein 


ORF00189 


SAG0172 


221 


protease, putative 


ORF00190 


SAG0173 


256 


pyrroline-5-carboxylate reductase 


ORF00191 


SAG0174 


355 


glutamyl-aminopeptidase 


ORF00192 


SAG0175 


79 


hypothetical protein 


ORF00193 


SAG0176 


94 


conserved hypothetical protein 


ORF00194 


SAG0177 


107 


thioredoxin family protein 


ORF00195 


SAG0178 


208 


tRNA binding domain protein 


ORF00196 


SAG0179 


238 


conserved hypothetical protein 


ORF00198 


SAG0180 


131 


single-strand binding protein 


ORF00199 


SAG0181 


214 


hydrolase, haloacid dehalogenase-like family 


ORF00200 


SAG0182 


581 


sensor histidine kinase, putative 


ORF00201 


SAG0183 


246 


response regulator 


ORF00203 


SAG0184 


151 


conserved hvoothetical Drafein 


ORF00204 


SAG0185 


242 


membrane orotein outative 


ORF00205 


SAG0186 


36 


hypothetical protein 


ORF00206 


SAG0187 


542 


oliaoDeDtide ABC transnortpr c>Iinr>npntirff»-hinriinn 

protein 


ORF00207 


SAG0188 


325 


oligopeptide ABC transporter, permease protein 


ORF00208 


SAG0189 


273 


oligopeptide ABC transporter, permease protein 


ORF00209 


SAG0190 


267 


peptide ABC transporter, ATP-binding protein 


ORF00210 


SAG0191 


208 


peptide ABC transporter, ATP-binding protein 


ORF00211 


SAG0192 


676 


PTS system, II ABC components 


ORF00212 


SAG0193 


541 


alpha amylase family protein 


ORF00214 


SAG0194 


639 


transcriptional antiterminator, BgIG family 


ORF00216 


SAG0195 


377 


IS1548, transposase 


ORF00217 


SAG0196 


66 


conserved domain protein 


ORF00218 


SAG0197 


94 


PTS system, IIB component, putative 


ORF00219 


SAG0198 


451 


PTS system, IIC component, putative 


ORF00220 


SAG0199 


285 


transketolase, N-terminal subunit 


ORF00221 


SAG0200 


309 


transketolase. C-terminal subunit 


ORF00223 


SAG0201 


| 419 


oxidoreductase, putative 


ORF00224 


SAG0202 


89 


ribosomal protein S15 


ORF00225 


SAG0203 


709 


polyribonucleotide nucleotidyltransferase 


ORF00226 


SAG0204 


250 


conserved hypothetical protein 


ORF00227 


SAG0205 


194 


serine O-acetyltransferase 


ORF00228 


SAG0206 


60 


hypothetical protein 


ORF00229 


SAG0207 


447 


cysteinyHRNA synthetase 


ORF00230 


SAG0208 


| 128 


conserved hypothetical protein 


ORF00231 


SAG0209 


I 251 


RNA methyitransferase, TrmH family, group 3 


ORF00232 


SAG0210 


172 


conserved hypothetical protein 


ORF00233 


SAG0211 


286 


DegV family protein 


ORF00234 


SAG0212 


32 


hypothetical protein 


ORF00235 


SAG0213 


39 


hypothetical protein 


ORF00236 


SAG0214 


148 


ribosomal protein L13 


| ORF00237 


SAG0215 


130 


ribosomal protein S9 


ORF00238 


SAG0216 


| 33 


hypothetical protein 


ORF00239 


SAG0217 


384 


site-specific recombinase, phage integrase family 


ORF00240 


SAG0218 


158 


transcriptional regulator, Cro/CI family 


ORF00241 


SAG0219 


101 


hypothetical protein 
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Table 32: Conv rsion 



f ORF Ref Nos. with SAOKff Nos. 



ORF Ref No 




33 j 


Vnn tati n 


ORF00242 


SAG0220 


92 


conserved hvnothetical protein 


ORF00243 


SAG0221 


76 


hypothetical protein 


ORF00244 i 


SAG0222 I 


108 


conserved domain protein 


ORF00245 


SAG0223 


209 


conserved hypothetical protein, fusion 


ORF00246 


SAG0224 


332 


replication initiation protein, putative 


ORF00247 


SAG0225 


144 


hvnothetical nrotein 


ORF00248 


SAG0226 


418 


recombination nrotein 


ORF00249 


SAG0227 


156 


hvnothetical nrotein 


ORF00250 


SAG0228 


111 


conserved hvnothetical nrotein 


ORF00251 


SAG0229 


95 


conserved hvnothetical nrotein 


ORF00252 


SAG0230 


96 


conserved hvnothetical nrotein 


ORF00253 


SAG0231 I 


135 


hvnothetical nrotein 

■ lyjiuu i w uvqi t/i viwii i 


ORF00254 


SAG0232 


186 


hvnothetical nrotein 

lljpWlllwUvOI pi Wlwd 1 


ORF00255 


SAG0233 


226 


hypothetical protein 


UKruuzoo 


OnbUZo4 




nypoineticai proiein 


vJKrUUZD/ 




no 


nypoineticai protein 




oAoUZob 


Od. 


nypoineticai protein 


UKrUUZob 


oAuUZor 


o4 


hypothetical protein 


ADCflAOftA 

LsKrUUZbU 


oAoUZoo 


41 


hypothetical protein 






Zoo . 


transcriptional regulator MutR family 




oAolJZ4U 


OM 

oyo 


transporter, putative 


S nocnnico 
UKrUU^Oo 


oAol)Z4l 


040 
ZlO 


amino acid ABC transporter, permease protein 




oAi?0Z4Z 


O AO 

308 


amino acid ABC transporter, amino acid-binding 

nmtotn 
JlUUSill 


ORF00265 




211 


flminn arid ARf^ tranfinnrfer nermease nrotein 


ORF00266 




381 


smino aniri ARO tran^nnrtt»r AXP-hindina nrotein 

Cll lilt l\J ClwIU AMJy 11 al lopUl ICI , »» 1 1 Ull lUII 1^ |JIUlwlII 


ORF00979 


SAG0245 


159 

1 Wfc 


Kx/nnthetif^al nrotein 

I ly |JwH( ICUwCll (JIUICIII 


ORF00273 


SAG0246 


268 


hvnnthetioal nrotein 
uy|jwii louwfli piviciii 


ORF00274 


SAG0947 


116 


h\/nnth#atir*iil nmtnin 
i iy pun iciiwai i^iuiciii 


ORF00275 

VIM Wfc • V 


SAG0248 


90 


hvnnthpfinal nrotein 

i iy pvu icu waj fjivsLwiii 


ORF00276 


SAG0249 


116 


hvnothetical nrotein 


ORF00278 


SAG0250 


193 


hypothetical protein 


r*RFon97Q 




79 


troncrrinttAnol r&cti ilafrtr rn IC* 1 familvr 

ucinscnpiioriai reguiaior, uro/vi ramiiy 


vvrxr UVi£>Ow 


O Aft AO CO 


1RR 
I OD 


al^iyiucinsicraoo, v3IMn 1 lamiiy 


urvruu^o i 


OAftrjocq 




arAhiHrancforaco fiMAT familu 
aCciyiirallST&rwtSo, wlNM 1 TcliTiliy 


ORFnn9R9 


QAf5fl9<iil 


99R 


awVciyill dlidlCFdoc, OIWI lalllliy 




OAftAOCC 




coiiseivea nypoinsiicai proiein 




OnU U£9U 


1R^ 

too 


r>l>IA puiyillclclbc aiyiTlol laWkUi , CVr oliUlalllliy 


f ORF009ft < 5 






hwnnthotiosl nrntotn 
iiypuuiBiiwoi piuicui 


r>RF009R7 




9fi9 


iraner^rintinnal roni ilsitrtr TotR 4iS milvy 
uai lauipUwiiai icyuidiL/i, i cirv laiiiuy 






^R5 


ARf fronertrtrfor Afflnv nrotoin rirrR familv/ ntftatSvck 
nDV/ Ual lo^JUi lci c i 1 1 Ua piULciii, uiio ictiiiuy, |julciliw 


ORF00289 


SAG0260 


238 


ABC transporter, ATP-binding protein 


ORF00290 


SAG0261 


129 


IS1381 , transposase OrfB 


ORF00291 


SAG0262 


127 


IS1381 , transposase OrfA 


ORF00292 


SAG0263 


i 171 


hypothetical protein 


ORF00293 


i SAG0264 


103 


conserved hypothetical protein 


ORFQ0294 


SAG0265 


235 


conserved hypothetical protein 


ORF00295 


SAG0266 


382 


N-acetylglucosarmne-6-phosphate deacetylase 


ORF00296 


SAG0267 


) 180 


conserved hypothetical protein 


ORF00297 


SAG0268 


304 


glycyl-tRNA synthetase, alpha subunit 


ORF00298 


SAG0269 


! 213 


acyl carrier protein phosphodiesterase, putative 


ORF00299 


SAG0270 


679 


glycyl-tRNA synthetase, beta subunit 


ORF00300 


SAG0271 


85 


conserved hypothetical protein 


ORF00301 


SAG0272 


87 


membrane protein, putativ 
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onv rsion of ORF Ref Nos. with SA^R f Nos. 



ORFRefN . 


SAGxxxx Ref No. 


aa 


Annotation 


ORF00302 


SAG0273 


502 


glycerol kinase 


ORF00303 


SAG0274 


609 


alpha-glycerophosphate oxidase 


ORF00304 


SAG0275 


232 


glycerol uptake facilitator protein 


ORF00305 


SAG0276 


445 


NADH oxidase, putative 


ORF00306 


SAG0277 


476 


conserved hypothetical protein 


ORF00307 


SAG0278 


661 


transketolase 


ORF00308 


SAG0279 


101 


conserved hypothetical protein | 


ORF00309 


SAG0280 


244 


ABC transporter, ATP-binding protein 


ORF00310 


SAG0281 


534 


membrane protein, putative 


I ORF00313 


SAG0282 


461 


DTC cuctpm IIRO rnmnnnontc 
• i w oyaicm, UDo IAJII IfJUIlclllo 


ORF00314 


SAG0283 


267 




ORF00315 


SAG0284 


417 


rifimmsi— Till Itsmul nh/>cnhato rorfuM^ea 


ORF00316 


SAG0285 


298 


cnn<?PivpH hunnthiatinal nrnfoin "Tlf^Ofinnnf? 

\*\Jl locivcu i lyfJULl icllLfCtl (JlUlcin 1 lur\UUUUD 


ORF00317 


SAG0286 




KMii uiviaiun proiein nsi~, puiaiive 


ORF00318 


SAG0287 


752 


nf3nif*illin_hinHinn nmlnin OY 

ptn iidiiin-Dinaing proiein <ca 


ORF00319 


SAG0288 




pnospno-iN-aceiyimuromoyi-peniapepuae-transterase 


ORF00320 


SAG0289 


447 


ATP-deDendent RNA h el lease DPAD/nPAW hnv 
family 


ORF00321 


SAG0290 


270 


ABC transporter, substrate-binding protein 


ORF00322 


SAG0291 


267 


amino acid ABC transporter, permease protein 


I ORF00323 


SAG0292 


247 


amino acid ABC transporter. ATP-bindina orotein ! 


ORF00324 


SAG0293 


74 


conserved hypothetical protein 


ORF00325 


SAG0294 


304 


thioredoxin reductase 


ORF00326 


SAG0295 


486 


conserved hypothetical protein 


ORF00327 


SAG0296 


273 


NAD synthetase 


ORF00328 


SAG0297 


444 


aminopeptldase C 


ORF00329 


SAG0298 


750 


penicillin-binding protein 1A 


ORF00330 


I SAG0299 


199 


recombination protein U 


ORF00331 


SAG0300 


172 


conserved hvDOthetical nrntpin 


ORF00332 


SAG0301 


40 


hypothetical protein 


ORF00333 


SAG0302 


110 


conserved hvoothetical orotein 


ORF00335 


SAG0303 


384 


conserved hypothetical protein 


ORF00336 


SAG0304 


487 


wiwcivcu iiypuuit?uucu proiein 


ORF00337 


SAG0305 


160 


ouivji luu^ci'fri t/iuuui#uuii protein LUXO 


ORF00338 


SAG0308 


535 




ORF00340 


SAG0307 


33 


hypothetical protein 


ORF00341 






Abu transporter, ATP-binding protein, FRAMESHIFT 


ORF00343 


SAG0309 


246 


uaiiofjuiid, |jci it lease proiein, puiauve 


ORF00344 


SAG0310 


361 


conserved Kivrmthptirctl nmfoin 


j ORF00345 


SAG0311 




DNA-hinriinn resnnn^e rpnnlatnr PfilMT Ml ITATIOM 


ORF00347 


SAG0312 


234 


conserved hypothetical protein 


ORF00348 


SAG0313 


209 


guanylate kinase 


ORF00349 


SAG0314 


104 


DNA-dlrected RNA polymerase, omega subunit, 
putative 


ORF00350 


SAG0315 


796 


primosomal protein N' 


ORF00351 


SAG0316 


311 


methionyMRNA formyltransferase 


ORF00352 


SAG0317 


440 


Sun protein 


ORF00353 


SAG0318 


245 


serine/threonine phosphatase, putative 


ORF00354 


SAG0319 


651 


serine/threonine protein kinase 


ORF00355 


SAG0320 


231 


conserved hypothetical p'rot in 


ORF00356 


SAG0321 


339 


sensor histidin kinase, putative 


ORF00358 


SAG0322 


213 


DNA-binding response regulator 
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Table 32: Conv rsi n of ORF Ref Nos. with SAGk f Nos. 



ORFR fNo. 


SAGxxxx Ref No. 


aa 


Ann tat! n 


ORF00359 


SAG0323 


466 


hydrolase, haloacid dehalogenase family/peptidyi- 
prolyl cis-trans isomerase, cyclophilin type 


ORF00360 


SAG0324 


124 


general stress protein, putative 


ORF00361 


SAG0325 


258 


pyruvate formate-lyase-activating enzyme 


ORF00362 


SAG0326 


251 


transcriptional regulator, DeoR family 


ORF00363 


SAG0327 


327 


transcriptional regulator, putative 


| ORF00364 


SAG0328 


107 


PTS system, cellobiose-specific HA component 


I ORF00366 


SAG0329 


106 


PTS system, cellobiose-specific IIB component 


ORF00367 


SAG0330 


433 


PTS system, cellobiose-specific HC component 


ORF00368 


SAG0331 


818 


formate acetyltransferase 


ORF00369 


SAG0332 


222 


transaldolase family protein 


ORF00371 


SAG0333 


362 


glycerol dehydrogenase 


ORF00372 


SAG0334 


308 


cysteine synthase A 


ORF00373 


SAG0335 


214 


conserved hypothetical protein TIGR00257 


ORF00374 


SAG0336 


429 


helicase, putative 


ORF00375 


SAG0337 


221 


competence protein F, putative 


ORF00376 


SAG0338 


184 


ribosomal subunit interface protein 


ORF003B2 


SAG0339 


450 


aspartate kinase family protein 


ORF00383 


SAG0340 


216 


hydrolase, haloacid dehalogenase-like family 


ORF00384 


SAG0341 


49 


hypothetical protein 


ORF00385 


SAG0342 


263 


P»novl-f^oA huHraf^Qp/I^nmoraco family/ nrntpin 
ciiuyrvvn iiyuiaiaoc/iouinciooc tali my pivJLdii 


ORF00386 


SAG0343 


144 


transcriptional regulator, MarR family 








3-oxoacy!-(acyl-carrier-protein) synthase III 


ORFOO^RR 
wrvruvwoo 




7A 


acyi earner protein 


ORF00390 


SAG0346 


319 


enoyI-(acyl-carrier-protein) reductase II 


rtDcrnivka-'i 
\JKrUUoyi 


oAoU347 


308 


malonyl CoA-acyi carrier protein transacylase 


ORF00392 


SAG0348 


244 


3-oxoacyl-[acyl-carrier protein] reductase 


ORF00393 


SAG0349 


410 


3-oxoacyl-(acyl-carrier-protein) synthase II 


ORF00394 


SAG0350 


166 


acety!-CoA carboxylase, biotin carboxyl carrier protein 


urvruuoso 






: 

(3R)-hydroxymyristoyl-(acyl-carrier-protein) 

Hohx/Hrotoco 
Ucliyuictlaoc 


ORF00396 


SAG0352 


456 


z>f s G>\\i\jC*f*& rarhnvulaeo Hintlri r»2>rf>rwvi laco 


ORF00397 

• ^ • Www f 


SAG0353 

wnvv v w v 


291 


ciwciy i*vun veil uuAy tela c v waiuuAyi uaiioidaot?, ucia 

subunit 


ORF00398 


SAG0354 


257 


acetyl-CoA carboxylase, carboxyl transferase, alpha 
subunit 


ORF00399 


SAG0355 


210 


conserved hypothetical protein 


ORF00400 


SAG0356 


425 


seryMRNA synthetase 


ORF00402 


SAG0357 


330 


hypothetical protein 


ORF00403 


SAG0358 


120 


conserved hypothetical protein ! 


ORF00404 


SAG0359 


303 


PTS system, mannose-specific IID component 


ORF00405 


SAG0360 


270 


PTS system, mannose-specific 110 component 


ORF00406 


SAG0361 


336 


PTS system, mannose-specific IIAB components 


ORF00407 


SAG0362 


270 


hydrolase, haloacid dehalogenase-like family 


ORF00408 


SAG0363 


194 


hypothetical protein 


ORF00409 


| SAG0364 


203 


membrane protein, putative 


ORF00410 


SAG0365 


473 


xanthine/uracil permease family protein 


ORF00411 


SAG0366 


169 


conserved hypothetical protein TIGR00150, putative 


ORF00412 


SAG0367 


186 


acetyltransferase, GNAT family 


ORF00413 


SAG0368 


435 


transcriptional regulator, putative 


ORF00414 


SAG0369 


98 


conserved hypothetical protein 


ORF00415 


SAG0370 


139 


HIT family protein 


ORF00416 


SAG0371 


167 


hypothetical protein 


ORF00417 


SAG0372 


85 


hypoth tical protein 



Tabl 32:^onv rsion of ORF Ref Nos. with SA<^R f Nos. 



ORF Ref No. 


SAGxxxx Ref No. 


aa 


Ann tati n 


ORF00419 


SAG0373 


241 


ABC transporter, ATP-binding protein 


ORF00421 ! 


SAG0374 


344 


ABC transporter, permease protein 


ORF00422 


SAG0375 


266 


conserved hypothetical protein 


ORF00423 


SAG0376 


211 


conserved hypothetical protein TIGR00091 


ORF00424 


SAG0377 


127 


conserved hypothetical protein, POINT MUTATION 


ORF00425 


SAG0378 


379 


N utilization substance protein A 


| ORF00426 


SAG0379 


98 


conserved hypothetical protein 


ORF00427 


SAG0380 


100 


ribosomal protein L7A family 


| ORF00428 


SAG0381 


927 


translation initiation factor IF-2 ! 


ORF00429 


SAG0382 


122 


ribosome-binding factor A 


ORF00430 


SAG0383 


334 


conserved hypothetical protein 


ORF00431 


SAG0384 


138 


transcriptional repressor CopY 


ORF00432 


SAG0385 


744 


copper-transporter ATPase CopA 


ORF00433 


SAG0386 


68 


copper-transporter protein CopZ 


ORF00434 


SAG0387 


204 


conserved hypothetical protein 


ORF00435 


SAG0388 


270 


hydrolase, haloacid dehalogenase-like family j 


ORF00436 


SAG0389 


880 


DNA polymerase I 


ORF00437 


SAG0390 


146 


CoA binding domain protein 


ORF00438 


SAG0391 


159 


transcriptional regulator, Fur family 


ORF00439 


SAG0392 


521 


ceD wall surface anchor family protein 


ORF00440 


SAG0393 


228 


DNA-binding response regulator 


ORF0G441 


SAG0394 


345 


sensor histidine kinase 


ORF00442 


SAG0395 


246 


conserved hypothetical protein ! 


ORF00443 


SAG0396 


380 


queuine tRNA-ribosyltransferase 


ORF00444 


SAG0397 


102 


conserved hypothetical protein 


ORF00445 


SAG0398 


179 


bioY family protein 


ORF00446 


SAG0399 


258 


AtsA/ElaC family protein 


ORF00447 


SAG0400 


168 


cytidine/deoxycytidylate deaminase family protein 


ORF00448 


SAG0401 


44 


hypothetical protein 


ORF00449 


SAG0402 


449 


glucose-6-phosphate isomerase 


| ORF00450 


SAG0403 


175 


5-formyltetrahydrofolate cycIo-Hgase family protein 


ORF00451 


SAG0404 


225 


rhomboid family protein 


ORF00452 


SAG0405 


347 


lipoprotein 


ORF00453 


SAG0406 


299 


UTP-gIucose-1 -phosphate uridy lyltransferase 


ORF00454 


SAG0407 


338 


glycerol-3-phosphate dehydrogenase (NAD(P)+) 


ORF00455 


SAG0408 


109 


ribonuclease P protein component 


ORF00456 


SAG0409 


271 


SpotllJ family protein 


ORF00458 


SAG0410 


273 


R3H domain protein 


ORF00463 


SAG0411 


177 


conserved hypothetical protein 


ORF00464 


SAG0412 


258 


RecX protein 


ORF00465 


SAG0413 


451 


RNA met hvl transferase TrmA familv 


ORF00466 


SAG0414 


153 


conserved hvoothetical orotein 


ORF00467 


SAG0415 


142 


acetv (transferase GNAT familv 


ORF00468 


SAG0416 


1233 


protease, putative 


ORF00469 


SAG0417 


302 


glycosyl transferase, group 2 family protein 


ORF00470 


SAG0418 


336 


ribonucleoside-diphosphate reductase 2, beta subunit 


ORF00471 


SAG0419 


137 


nrdl protein 


ORF00472 


SAG0420 


721 


ribonucleoside-diphosphate reductase 2, alpha subunit 


ORF00473 


SAG0421 


1055 


conserved hypothetical protein 


ORF00474 


SAG0422 


129 


conserved hypothetical protein 


ORF00475 


SAG0423 


| 132 


conserved domain protein \ 
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Tabl 32: Conv rsi n of ORF Ref Nos. with SA^K f Nos. 



ORF Ref N . 

wl\l • 


SAGxxxxRefN . 


aa 


\nn tatlon 


ORF00476 


SAG0424 


94 


hypothetical protein 


ORF00478 


SAG0425 


105 


carboxvmuconolactone decarboxylase familv orotein 

vwi wwa^ 1 1 IMVVI 1 WICIwlvl IC UvUQIUwAilClwS Itaillilj |/l vlvll 1 


ORF00479 


SAG0426 


131 


conserved hypothetical protein 


ORF00480 


SAG0427 


129 


transcriptional regulator, MerR family 


ORF00482 


SAG0428 


345 


alcohol dehydrogenase, zinc-containing 


ORF00483 


SAG0429 


284 


oxidoreductase, aldo/keto reductase family 


ORF00484 


SAG0430 


287 | 


cation efflux system protein 


ORF00485 


SAG0431 


174 


transcriptional regulator, TetR family 


ORF00486 


SAG0432 


397 


transcriptional regulator, AraC family 


ORF00487 


SAG0433 


1389 


surface protein Rib 


ORF00488 


SAG0434 


61 


transposase, IS256 family, truncation 


ORF00489 


SAG0435 


97 


DNA-damage-inducible protein J, putative 


ORF00490 


SAG0436 


62 


hypothetical protein 


ORF00491 


SAG0437 


123 


hypothetical protein 


| ORF00493 


SAG0438 


145 


bacteriophage L54a, integrase, truncation 


ORF00495 


SAG0439 




conserved hypothetical protein, FRAMESHIFT 


ORF00496 


SAG0440 


84 


conserved hypothetical protein 


ORF00497 


SAG0441 


103 


conserved domain protein 


ORF00499 


SAG0442 


189 


acetyl transferase, GNAT family 


ORF00500 


SAG0443 


194 


acetyltransferase, GNAT family 




OnWrt t 


I oo 


conbervea nypoineucoi protein 


ORFO0502 




ooo 


ualv/LtPMA cunthotseo 
vmyi— ir\iN/\ oyl ill Icldot; 


I ORFOD503 I 


unvjuttu 


31 Q 


UAIUUICUUUdbC! OIU/IUI l/IVluwA loll Illy 


ORF00504 

VIM WW»^T 


SAG0447 


9R7 


Hid^i icoiuiii u ai iqjjui ici , wuin laiiiiiy 


ORF00506 




057 1 


UallolJUodoc, laiiiiiy 


ORF00507 

wr\r uuww i 


SAG044Q 

WAV? WTtp 




uuiiaci vcu i lyjjou it?u uai jjiuicim 


ORF00508 




330 


siQrmrtfltp pmmnnia IrnnQP 

dofJcii itaic^ai 1 11 1 lui lid iiijctoc 


ORF00510 


SAG0451 


149 




! ORF0051 1 

*^l *l WW I I 


SAG0452 


179 


fvnp II nNA mndificfltinn mpthvltran^f praxes nutatix/p 


ORF00512 


SAG0453 


96 


hypothetical protein 


ORF00513 


SAG0454 


161 


phosphopantetheine adenylyltransferase 


ORF00515 


SAG0455 


357 


conserved hypothetical protein 


ORF00518 


SAG0456 




conserved hypothetical protein, degenerate 


ORF00519 


| SAG0457 


192 


conserved hypothetical protein 


ORFO0520 


SAG0458 


368 


conserved hypothetical protein TIGR00048 


ORF00521 


SAG0459 


171 


VartZF domain protein 


ORF00522 


SAG0460 


581 


ABC transporter, ATP-binding/permease protein 


ORF00523 


SAG0461 


\ 579 


ABC transporter, ATP-binding/permease protein 


ORF00524 


SAG0462 


! 188 


anthranilate synthase component II 


ORF00525 


SAG0463 


179 


bioY family protein 


ORF00526 


SAG0464 


330 


biotin synthetase 


ORF00527 


SAG0465 


164 


hypothetical protein 


ORF00528 


SAG0466 


371 


thiolase 


ORF00531 


SAG0467 


409 


AMP-binding enzyme domain protein 


ORF00532 


SAG0468 


210 


endonuclease Ml 


ORF00533 


SAG0469 


131 


type IV prepilin peptidase-related protein 


ORF00534 


SAG0470 


69 


conserved hypothetical protein 


ORF00535 


SAG0471 


322 


glucokinase 


ORF00536 


SAG0472 


126 


rhodanese domain protein 


ORF00537 


SAG0473 


613 


elongation factor Tu family protein 


ORF00538 


SAG0474 


81 


conserved hypothetical protein 


ORF00540 


SAG0475 


451 


UDP-N-acetylmuramoylalanine-D-glutamate ligase 
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Table 32: Conversi n of ORF Ref Nos. with SAG~R f Nos. 



ORF R f No. 


SAGxxxx Ref No. 


aa 


Annotation 


ORF00541 


SAG0476 


358 


UDP-N-acety!glucosamine— N-acetylmuramyl- 
pentapeptide) pyrophosphoryl-undecaprenol N- 

anotvlnli irAQpmlno trancforacfi ' 




SAnOA77 


^7R 

Of O I 


roll Hiwicinn nrntoin PiiwIR mrlnfrv/n 
bell UlVlolUJI pIUlClII U1VID, [JUIdUVC 




SAHfM7ft 

wOwUt / w 




recall H ci n n nrotoin FicA 

wen uivtoiuii pruicin nsn 


ORF00545 


SAG0479 


426 


cell division protein FtsZ 


UKr0u54o 


oAL>U4oU 


224 


ylmE protein, putative 


vJKrUUo47 


oAv?U4wl 


201 


ylmF protein 


ORruUo4o 


oAo04o^ 


84 


YGGT family protein 


UKr 00549 


SAG0483 


262 


ylmH protein 


ADcnnccn 


c A /~*r\ AO A 1 

SAG0484 


256 


cell division protein DivlVA, putative j 


ORF00552 


SAG0485 


930 


isoleucyl-tRNA synthetase 


ORF00553 


SAG0486 


100 


conserved hypothetical protein 


ORF00554 


SAG0487 


151 


MutT/nudix family protein 


ORF00555 


SAG0488 


753 


ATP-dependent Clp protease, ATP-binding subunit 


ORF00556 


SAG0489 


34 


hypothetical protein 


ORF00557 


SAG0490 


76 


conserved hypothetical protein 


ORF00558 


SAG0491 


230 


amino acid ABC transporter, permease protein 


ORF00559 


SAG0492 


244 


amino acid ABC transporter, ATP-binding protein 


ORF00560 


SAG0493 


564 


phosphoglucomutase/phosphomannomutase family 
protein 


ORF00562 


SAG0494 


284 


methylenetetrahydrofolate 
dehydrogenase/methenyltetrahydrofolate 
cyctonya roiase 


UixrUUuOO 


Q Af^fYAQA 


CfO 


conserved hypothetical protein 


Ui\rUUOO*r 


QArSH/lQR 


AAfi 


exodeoxyri bo nuclease VII, large subunit 


I Ur\rUUODO 


QAf^nAQ7 


t\ 


exoaeoxyn do nuclease vii, small suDunit ; 


ur\ruujuD 


QAf^fMQft 




geranyiiransiransierase, puiauve 






97c 


hamnltiein A 

nernoiysin m 


vr\rUU3DO 


OnOUOUU 




arginine repressor Argr\, putative 




QAr^OAm 


ceo 


uinm repair protein rseciM 


ORF0n*571 

vfxrUUJ f 1 


RA<"?0R0? 




r^cin\/ fatYiilu nmi&ln 

uegv 1 dii illy protein 


! ORF00 < 572 


on uu o uj 


97Q 

£.19 


upase/A\cyinyaroiase t puiauve 




[ OArin e yM 

OAwVIWr 




wonservea nypoineuoai protein 


ORFOQ574. 




Q1 
57 1 


HMA-hlnrlinn nrntoin Ht 1 
uiNrv-tiinQing protein nvj 


vr\ruuj » w 




Ow 


liypULIIcSUUcll piOlclfl 


ORF00576 

WIM www f U 


SAGQ507 


310 


uii lyuc uui uiaia uciiyuiuyciiaoD a 


ORF00577 




A11 
*t 1 1 


hot fl_l struts m rpcictonro far»t/>r 
UCia-la wlctl 1 1 1 col dial 11X3 Idi/LUi 


ORFQ0578 






hota.larfam roclctanro for*tnr 
Uwla-laUlaJil fwolwlailwC? laClOr 


ORF00579 


SAG0510 


406 


murM protein, putative 


riRFnn^ftn 
ur\ruuoou 


oAuUoi 1 


u 


hydrolase, haloacid dehalogenase-like family 


ORF00531 


SAG0512 


438 


HD domain protein 


ORF00582 


SAG0513 


128 


conserved hypothetical protein 


ORF00583 


SAG0514 


894 


cation-transporting ATPase, E1-E2 family 


npFnnsftd 
ur\ru\joo** 






conserved hypothetical protein 


ORF00585 


SAG0516 


643 


fractose-1,6-bisphosphatase, putative 


ORF00586 


SAG0517 


374 


iron-sulfur cluster-binding protein, putative 


ORF00587 


SAG0518 




peptide chain release factor 2, FRAMESHIFT 


ORF00588 


SAG0519 


230 


cell division ABC transport r, ATP-binding protein FtsE 


ORF00589 


SAG0520 


309 


cell division ABC transporter, permease protein FtsX 


ORF00590 


SAG0521 


I 236 


carboxymethylenebut nolidase-related protein 


ORF00591 


SAG0522 


232 


metallo-beta-lactamase superfamily protein 
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Table 32: Conv rsion f ORF Reff Nos. with SAv^Kf Nos. 



Ul\r t\ T NO. 


C A ww Bit a. 

oAuXXXX Ker NO. 


aa 


Ann tatton 


unruuoyz 




OCA 


oxido reductase, short chain dehydrogenase/reductase 
family 


ORF00593 


SAG0524 


835 


DNA DOlvmpra^P III pncilnn Qiihiintt/ATP-Hononrfont- 

helicase DinG 


ORF00595 


SAG0525 


397 


aspartate aminotransferase 


ORF00596 


SAG0526 


448 


asparaginyMRNA synthetase 


ORF00597 

XXI XI WVWtfl 


SAG0527 


1 Q*x 


vuiiooivcu iiypuuixvUCcii proiein 


ORF00598 

XX 1 VI UWV C/^X 


SAG0528 




ii juciij ic-unuine preierririg nucieosiue nyaroiase 


ORF00599 


SAG0529 


*3R 


i iypuu icuiscu protein 


ORF00600 


SAG0530 


1^7 


Nxoiiivx/wi ii tciii my piuicni 


ORF00601 


SAG0531 




uuiibcivcu nypoinsiicai proisiri 


ORF00602 

XXI XI W 


SAG0532 


•DC-** 


v<ui laci vtju nypouicuCcii protein 


ORF0QR03 

XX 1 xl UwUvw 






1 In^harar^tori^Arl Rf**E? /lR-t 

uiiuiioiacienzea Dvxrs, i»uoi*toi 


ORFOGROd 

viAruuwt 


0/AV3V/«x»x*t 


too 


uipcptiaase 


ORFonsn^ 

vr\r uvjvjuvx 


QAGn*TO*i 


QUO 


zinc Mtxvx iransponer, zinc-Dinaing aanesion nprotem 


ORF00606 


SAG0536 


86 


ribosomat protein L31 


1 ORFOOR07 1 
v/r\ruuuu t 


OnoUOO / 




unn Tamiiy protein 


Vxlxl Vx uo uo 


OnUwOOO 


VX*tU 


aaenosme deaminase, putative 




OM\3UOO?7 


14/ 


Tiavoaoxin 


vxrvi ixixvx i \j 




01 


chorismate mutase, putative 


ORFflORI 1 
vrxrUUO I 1 


oMoUO** 1 


QQQ 


voltage-gated chloride channel family protein 


ORFnnRi? 




1*x*7 


IC10Q1 A-——^ — — — — — — f\-£A 

loiooi, transposase OnA 


vrxruuu i «x 




i 


loiooi, transposase vjitd 


ORFDHRId. 




•lie 
1 JO 


ribosomal protein L19 


! ORF00R1S 




ooy 


site-specific recombinase, phage integrase family 


ORF00617 


SAG0546 


67 


consen/pd domain nrotptn 


ORF00618 


SAG0547 


185 


hvoothpfical nrofpin 


ORF00619 


SAG0548 


265 


reoressor nrotein outativp 


ORF00620 


SAG0549 


47 


hvoothetical d rots in 


ORF00621 


| SAG0550 


74 


conserved hvoothetical orotein 


ORF00622 


SAG0551 


52 


conserved hypothetical protein 


ORF00623 


SAG0552 


62 


hypothetical protein 


ORF00624 


SAG0553 


268 


hypothetical protein 


ORF00626 


SAG0554 


63 


transcriptional regulator, Cro/CI family 


ORF00627 


SAG0555 


249 


antirepressor, putative 


ORF00628 


SAG0556 


47 


hypothetical protein 


ORF00630 


SAG0557 


76 


hypothetical protein 


ORF00632 


SAG0558 


74 


hypothetical protein 


ORF00633 


SAG0559 


286 


conserved hypothetical protein 


ORF00634 


I SAG0560 


77 


conserved hypothetical protein 


ORF00635 


SAG0561 


46 


hypothetical protein < 


ORF00636 


SAG0562 


84 


hypothetical protein 


ORF00637 


| SAG0563 


53 


hypothetical protein 


ORF00638 


SAG0564 


160 


conserved hypothetical protein 


ORF00639 


SAG0565 


224 


conserved domain protein 


ORF00640 


SAG0566 


138 


single-strand binding protein 


ORF00641 


SAG0567 


439 


reverse transcriptase/maturase family protein 


ORF00842 


SAG0568 


| 67 


conserved hypothetical protein 


ORF00643 


SAG0569 


I 158 


conserved hypothetical protein 


ORF00644 


SAG0570 


115 


hypothetical protein 


ORF00845 


SAG0571 


43 


hypothetical protein 


ORF00646 


SAG0572 


138 


conserv d hypothetical protein 


ORF00647 


SAG0573 


54 


hypothetical protein 
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Tabl 32: Conversion of ORF R fN s. with SAC^fef Nos. 



ORF Ref No. 


SAGxxxx Ref No. 


aa 


Annotati n 


ORF00648 


SAG0574 


89 


conserved hypothetical protein 


ORF00649 


SAG0575 


110 


hypothetical protein 


ORF00650 


SAG0576 


43 


hypothetical protein 


ORF00652 


SAG0577 


177 


conserved hvDothetical nmtein 


ORF00653 


SAG0578 


88 


conserved hvoathetieal nrntein 


ORF00654 


SAG0581 


118 


conserved hvDothetical orotein 


| ORF00655 


SAG0582 


422 


conserved hvDothetical orotein 


ORF00656 


SAG0583 


406 


conserved hvDothetical orotein 


ORF00657 


SAG0584 


62 


Conserved hvnothetieal nmtein tn in nation 


ORF00658 


SAG0585 


471 


conserved hvnothetieal orotein 

wviioei lljpuil ICllwCII plUlCIII 


! ORF00659 


SAG0586 


154 


conserved hvnothetieal orotein 


ORF00660 


SAG0587 


300 


Structural orotein nutativa 


ORF00661 


SAG0588 


71 


conserved hvnothetieal nmtein 


ORF00662 


SAG0589 


143 


conserved hvnothetieal nrntein 


| ORF00663 


SAG0590 


112 


Conserved hvnothetieal nrntein 

Wl Ivbl WCU 1 lypvil ICULiul pi UlCII 1 


ORF00664 


SAG0591 


78 


conserved hvnnthetieal nrntein 

wiiocivcu i lypuu iciiodl pi Ulcll I 


ORF00665 


SAG0592 


111 


conserved hvnothetieal nmtpin 


ORF00666 


SAG0593 


185 


strueturat nrntein 


ORF00667 


SAG0594 


81 


conserved hvnothetieal nrntein 
v#wi ioci vcu i iy pun icuuai {Ji ulcn i 


ORF00668 


SAG0595 


123 


conserved hvnothetieal nmtein 
wi ioci wcu i lypuu iciiodi piuicm 


ORF00669 


SAG0596 


670 


PhIA internal rletetinn 


ORF00670 


SAG0597 


506 


minor <^tmr*ti iral nmtoin mitatmo 

■ IIIIIUI OlIUltlUICM |JIULdll| |JU(,«UVt7 


ORF00671 


SAG0598 


1374 


minor Qtn ir*ti irsil nrnfpin nt if cstiwo 
niiiiisi qu uuiuiai |jiviciii f [JUiaUVc 


ORF00672 


SAG0599 


668 


minnr c&tnipfursl nmtein niitatiwo 

iiuiiui oil Lii^iui al ^JIULdll, UUldllVCr 


ORF00673 


SAG0600 


109 


hypothetical protein 


I ORF00674 


OMUUOU t 




hypothetical protein 


ORF00675 




mo 


con serve a nypotneticai protein 


ORF00676 




111 
ill 


noun, puiaiive 


ORF00677 






lysin, puiaiive 


ORF00R7a 
v»» r\i i \j 






conservea nypotneticai protein 


ORF00679 






conservea nypoineucai protein 


ORF00681 




*iR 


conservea nypomeiicai protein 


ORF00682 


SAG0608 




nypoineuccii protein 


ORF00683 


SAGQ809 


1Q^ 


sue-specmc recorriDinase, pnage integrase Tamiiy 


| ORF00685 


SAG0610 


134 


conserved hvDothetical orotein 


ORF00687 


SAG0611 




transDOsase deaenerate FRAMESHIFT 

VI 1 ■ VpWWUUU | MwyvllWiaiW 1 1 VHtlkWI III 1 


ORF00689 


SAG0612 


53 


conserved hvDothetical orotein FRAMFSHIFT 

wi iwwi v w\<a i ijpuuiwllvGII pi Vlwll l| 1 Iv/vlVIL^xJI [|| | 


ORF00690 


SAG0613 


425 


transmembrane protein Vexpl 


ORF00691 


| SAG0614 


218 


ABC transporter, ATP-binding protein Vexp2 


ORF00692 


SAG0615 


458 


transmembrane protein Vexp3 


ORF00693 


SAG0616 


217 


DNA-binding response regulator VncR 


ORF00694 


SAG0617 


439 


sensor histidine kinase VncS 


ORF00695 


[ SAG0618 


195 


transposase OrfB, IS3 family, truncation 


ORF00697 


i SAG0619 


66 


conserved hypothetical protein 


ORF00698 


SAG0620 


62 


hypothetical protein 


ORF00699 


! SAG0621 


401 


rod shape-determining protein RodA, putativeO 


ORF00700 


! SAG0622 


186 


hydrolase, haloactd dehalogenase-like family 


ORF00701 


SAG0623 


650 


DNA gyrase, B subunit 


ORF00702 


SAG0624 


574 


septation ring formation regulator EzrA, putative 


ORF00703 


SAG0625 


213 


phosphoserine phosphatase SerB 


ORF00704 


SAG0626 


161 


MutT/nudix family protein 


ORF00705 


SAG0627 


151 


conserved hypothetical protein 


ORF00706 


I SAG0628 


435 


enolase 


ORF00707 


SAG0629 


354 


conserved domain protein 
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Table 32: C nversion f ORF R f Nos. with SA^R F Nos. 



ORF Ref No. 


SAGxxxx Ref No. 


aa 


Ann At At inn 


ORF00708 


SAG0630 


427 


3«nhosnhoshil€fmAtp 1 — j~2irhr\Y\/\/inv/ltr£mGforaca 
iuof;i iwgi ni\ii 1 laic I— C<II UUAy VII lylll allolcl doc 


ORF00709 


SAG0631 


170 


Qhikimatp kinacp 


ORF00710 


SAG0632 


457 


Dsr nrntpin 


ORF00711 


SAG0633 


451 


RNA methyltransferase, TrmA family 


UnrUUf 1 & 


Or\V3 UOOH 


7fi 


hypothetical protein 


ORPnn7i ^ 

ur\ruu/ 10 


wr\V3 UQOO 


OAR 


acid phosphatase precursor, class B 


ORPnnriA 


OnVJuDOD 


470 
1 i £. 


conserved hypothetical protein 


DRPfin717 
UrVrUUl I / 






transcriptional regulator, TetR family, putative, 
rrv\iviconir i 


ORF00718 


SAG0638 


109 


wan ouiiace ancnor lamiiy proiein 


ORF00720 


SAG0639 


273 


trancnneaep OrfR I QO familtt 
u ou lopudaac wiiD, loo tainiiy 


ORF00721 


SAG0640 




UalldjjUoaac \Jflr\ t loo TeHTlliy 


ORF00722 

VIM WW f ^m^m 


SAG0641 I 




i nozoz, wit iu proiein, aegeneraie ryiiN i ivju i ai iun 


ORF00723 


SAG0642 


59 


hvoothetical Drotein 


ORF00725 


SAG0843 




chaDeronin 33 kDa DEGENERATE 


ORF00726 


SAG0644 


402 


transcrintional rpntilatnr Aran familv/ 


ORF00727 


SAG0645 


554 


cell wall surface anchor family nrntpin nntath/P 

wQII WWII OUI IQ^w Ol IKjl IUI lai 1 Illy ^11 vwll !| |JUl0UVv? 


ORF00728 


SAG0646 


307 


cpII wall Qiirfacp anrhnr fr^milu nrntoln 

wvll WCIII OUI ICIWC alll*IIUI iciiiiiiy jJiwitsiii 


ORF00729 


SAG0647 


305 


«nrta<>p fa milv nrntpin •* 
out iqoc lai i my yJi uicii I 


ORF00731 


SAG0648 


260 


COrti*c*a f am ill/ nrntpin 

Ovi IQOC 1 all Illy U| UICII 1 


ORF00732 


SAG0649 


890 


#^o|| wall ci irfaro anrhnr •fra mil*/ nrntoin rttitotiwa 
veil wan dui lauc aitv#iiwi laiiuiy |Ji<Jiciii,pUieilive 


ORF00734 


SAG0650 


189 


owi iosc laiiiuy i c i i i, rrvMvitonir I 


ORF00735 


SAG0651 


201 


hvoothetical nrntpin 

i ty pvu iviiwoi piuiviii 


ORF00737 


SAG0653 


76 


conspn/pri hvnnthptical nrntpin nPf^PMPRATP 

IAJIIOCI VCU 1 iy|/WU lOWfOll |/IvlCIII, UUWLINCiVM c 


ORF00738 


SAG0654 


34 


hvoothetical nrntpin 


ORF00740 


SAG0656 


36 


hvnothetical nrntpin 

iijpuuiwiiwai pi uiciii 


ORF00741 


SAG0657 


89 


hvoothetical nrotein 


ORF00742 


SAG0658 


383 


I i do Drotein nutative 


| ORF00743 


SAG0659 


330 


ABC trans norter ATP-nindina nrntpin 


ORF00744 


SAG0660 


272 


membrane nrotein 


ORF00745 


SAG0661 


261 


conserved hvnothetical Drotein 


ORF00747 


SAG0663 


282 


cyiD protein 


ORF00748 


SAG0664 


240 


cylG protein 


ORF00749 


SAG0665 


101 


acyl carrier protein AcpC 


ORF00750 


SAG0666 


158 


cylZ protein FRAMESHIFT 


ORF00751 


SAG0667 


309 


cylA protein 


ORF00752 


SAG0668 


292 


cylB protein 


ORF00753 


i SAG0669 


667 


cylE protein 


ORF00764 


SAG0670 


317 


cySF protein 


ORF00755 


SAG0671 


731 


cyil protein 


ORF00756 


SAG0672 


403 


cyiJ protein 


ORF00757 


SAG0673 


191 


cyiK protein 


ORF00758 


SAG0674 


113 


hypothetical protein 


ORF00759 


SAG0675 


171 


fiurfacp nrntpin antinpn—rplzttpri nrotpin 
oui iqwd yji uicii i ai myci i~i ciaicu yj i vj icii > 


ORF00760 


SAG0676 


885 


serine protease, subtilase family, putative 


ORF00761 


SAG0677 


1062 


hypothetical protein 


ORF00762 


SAG0678 




endopeptidase O DEGENERATE 


ORF00766 


SAG0879 


286 


hydrolase, alpha/beta fold family, putative 


ORF00767 


SAG0680 


339 


hypothetical protein 


ORF00768 


SAG0681 


353 


conserved domain protein 


ORF00769 


SAG0682 


I 409 


permease, putative 


ORF00770 


SAG0683 




transmembrane protein Vexp3, putative FRAMESHIFT 


ORF00774 


SAG0684 


223 


ABC transporter, ATP-binding protein 
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Table 32: C nv rsi n of ORF Ref Nos. with SAG R fNos. 



ORFRefN . 


SAGxxxx Ref No. 


aa 


Annotation 


ORF00775 


SAG0685 


472 


conserved hypothetical protein 


ORF00776 


SAG0686 


261 


DNA-entry nuclease, putative 


ORF00777 


SAG0687 


212 


DedA family protein, putative 


ORF00778 1 


SAG0688 


218 


ARC transporter ATP-hinrlinn nrntain 

nuw uaiiopuuci, r\ 1 r uu lull ly LJIUlclll 


ORF00779 


SAG0689 


257 


membrane orofein niitath/P 


ORF00780 


SAG0690 


272 


consprvpd hvnnthpti<*al nrntoin 

UUIIOCI WCVI 1 1 J yf\J\l ICllwCH LJI wTCll I 


ORF00781 


SAGQ691 


294 


tran^prtntinnal rpniifatnr 1 ucR fomiki 
u at out i|JUwJ icii 1 cyuiaiui , i_yorx laillliy 


ORF00783 


SAG0692 | 


193 


regulatory protein, putative j 


UnrUU/a3 




077 
O/ / 


it>io4o, transposase 


UrxrUUrOO 




I/O 


regulatory protein, putative, truncation 


i HRPnft7A7 




00U 


u-iaciaie aenyarogenase 


UKrUU/oo 


oAVjUDwO 


C<IC 


sodium:galactoside symporter family protein, putative 


ORF00789 


SAG0697 


341 


2_lf0tn— 3- Hon Writ l irnnato Vinaca 

fc~i\tsiw^o-MCOAygiucoriaie Kinase 


ORF0D790 

will \>u f 9w* 




RQQ 


uc ici-y iu uui ui 1 luciow 


ORF007Q1 






irariwwnpiioridi regulator, onir\ Tamiiy 


ORF00792 






^-aeiiyuru-o-ueoxypnospnogiuconaie aiQOiase/*f- 
hvdroYV-2-nYnflliitaratp aldolase 


ORF00793 


SAG0701 


466 


Glucuronate isomerase 


ORF00794 


SAG0702 


348 


mannonate dehydratase 


ORF00795 


SAG0703 


279 


O-mannonate oxi do reductase 

W^l 1 IGII IJ Iwl lOlw vnlUUIwUUwIQwv 


ORF00796 


SAG0704 


270 


hvdrolasp halnariri riphalnnpna ^p~1ikp famihf 

1 1 jui wimwC) iiaiuawiu uci laivjuci laoc^iiivc iQllllly 


ORF00797 


SAG0705 


596 


filvpncv/l hvrirolac^ fsimtlv 3 
yiyouoyi i lyui uiuoc, loiiliiy o 


ORF00798 


SAG0706 


361 


Droline diceotidase 

VI Will * Uipwi/UU Qww 


ORF00799 


SAG0707 


334 


transcriptional reoulator RpoM family 

u 01 ipiiwi mi 1 cyuiaiui ( iNcyivi icmmy 


ORF00800 


SAG0708 


488 


alpha amylase family protein 


ORFGD801 




339 


giycosyi iransTerase, group i Tamiiy protein 




SAG071O 


*t*t*t 


yiyuusyi uaiioi erase, group 1 larniiy proiein 


ORF00803 


SAG071 1 


647 


u if cut iy i*u\int\ oyiiuicuaoc 


ORF008Q4 


SAG0712 


234 


ONl/V_hinHtnn roennneo reMiiilatnr 

uiN/A-Dinuing responses regulator 


ORF00805 


SAG0713 


33Q 


Lunacivcu iiypouisuwcii pruicin 


ORF00806 

%• www 


SAG0714 


188 

IUO 


wajiiocivcu iiy|iuuicUwcii pivJiciii 


ORF00807 


SAG0715 


216 


diiuiivi OvlU now ii dMo|JUJ lei , (JO* 1 1 Icdoc piUlclli 


ORF00808 


SAG0716 

wnvw # iv 


I 231 


Ollllllw) ClwlU ADv U CU |0|JWI LCI , fJcllllOcJaw piUltilll 


ORF00809 


SAG0717 

wnwv (if 


266 


nminn ariti ARfi trancnnrfor aminn anirl-Kinrlinn 

alllJIIw) CIUU nDv U Cil IwfJUl LCI | allllllLr OwlU~UIIIUIIli| 

protein 


ORF00810 


SAG0718 


251 


amino add ABC transporter, ATP-binding protein 


ORF00811 


SAG0719 


236 


DNA-binding response regulator 


ORF00812 


SAG0720 


449 


sensory box histidine kinase 


ORF00813 


SAG0721 


269 


metal lo-beta-lactamase family protein 


ORF00814 


SAG0722 


I 122 


conserved hypothetical protein 


ORF00815 


SAG0723 


236 


nbonuclease III 


ORF00816 


SAG0724 


1179 


SMC family protein 


ORF00817 


SAG0725 


j 265 


hydrolase, hatoacid dehalogenase-like family 


i ORF00818 


SAG0726 


274 


hydrolase, haloacid dehalogenase-like family 


ORF00819 


SAG0727 


536 


signal recognition particle-docking protein FtsY 


ORF00820 


SAG0728 


270 


ABC transporter, substrate-binding protein 


ORF00821 


SAG0729 


300 


ABC transporter, permease protein, putative 


ORF00822 


SAG0730 


42 


ABC transporter, ATP-binding protein 


ORF00823 


SAG0731 


347 


bacterial iuciferase family protein 


ORF00824 


SAG0732 


720 


transcriptional accessory protein Tex, putative 


ORF00825 


SAG0733 


142 


conserved hypothetical protein 


ORF00826 


SAG0734 


87 


phage shock protein C, putativ 


ORF00827 


SAG0735 


44 


hypothetical protein 


ORF00828 


SAG0736 


311 


HPr(Ser) kinase/phosphatase 
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Tabl 32: Conversi nofORFR f Nos. with SAG R fNos. 



ORF Ref No. 


SAGxxxx Ref No. 


33 


Ann tation 

mill 


ORF00830 


SAG0737 


257 


DroliDODrotein diacvlalvcervl transferase 


ORF00832 


SAG0738 


132 


conserved hypothetical protein 


ORF00833 | 


SAG0739 


143 


conserved hypothetical orate in 


ORF00834 


SAG0740 


91 


conserved hVDothatical n rate in 


ORF00835 


SAG0741 


303 


peptidase, U32 family, putative 


ORF00836 


SAG0742 


428 


peptidase, U32 family 


ORF00837 


SAG0743 


70 


conserved hvDOthetical orotein 


ORF00838 


SAG0744 


265 


rnernbrane Drotein outafive 


ORF00839 


SAG0745 


446 


Mn2+/Fe2+ transoorter NRAMP familv 

IVPI l*« ■ 9 J wfc Ivpvi Ivl | 1 i| V»*1WI > |Q| | Illy 


ORF00840 


SAG0746 


369 


riboflavin biosvnthesfc nrntein Rihn 


ORF00841 


SAG0747 


208 


riboflavin svnthase aloha Qiihnnit 


ORF00842 


SAG0748 


397 


riboflavin btosvnthesi^ nrntein RihA 

II UVI1CIVI4 1 W>VW]fllUICO|3 Wl wICII 1 1 XlM/l 


ORF00843 


SAG0749 


156 


riboflavin svnthase heta «:iihiinit 


ORF00844 


SAG0750 


496 


Ivsvl-tRNA svnfheta^e 


j ORF00845 


SAG0751 


300 


i lyui uiaoc, i laiuauu ueiiaiuywllaow -lllvw ictmiiy 


ORF00846 


SAG0752 


213 


piiwopi myiyvwieiiw inuidso Family proiein 


ORF00847 


SAG0753 

%*#»N^W 1 WW 


157 

1 wf 


eh<iO familv nrntein ni itativch 
ouov icii i my \ji uicn i, ^ju id Live 


ORF00848 


SAG0754 

wnwu 9 W"T 


205 


wwllwOIVOw UwIIIalll fJIUlcIII 


ORF00850 


SAG0755 


282 


nentidase U32 familv 


ORF00852 


SAG0756 

^*/»V^W » WW 


174 


lAiiidcivcu iiypuu icuudi jjiuigiII 


ORF00853 


SAG0757 


12Q 


linnnrotein m itntiue 


ORF00855 


SAG075B 

wnvv/i wo 


www 




ORF00856 


SAG0759 


931 


phosphoenoipyruvate carboxylase 




oAuU/OU 


Of 1 


loi d4o, trans posase 


or Fnnft *iQ 


OnuUrOl 




cell division protein, FtsW/RodA/SpoVE family 


ORFfinflRI 




OQp 

www 


translation elongation factor Tu 


ORFOnRR*^ 


CnuU / DO 




triosephosphate isomerase 


ORFflORRi* 






p h os phogly cerate mutase 


ORFOORfifi 




Do 1 


peniciiiin-Dinaing protein £,o 


ORF00RR7 


OnwUrOO 


Iww 


re com Din a it on protein KecK 


ORFOORRB 


QAf5fY7R7 


w*KJ 


D-alanine — D-alanine ligase 


ORF0Q869 

1 \ 1 WWWWW 


*wA\\JU I DO 


*tDD 


uur*-N-aceryimuramoyiaianyHL>-giuiam 
diaminopimelate-D-alanyl-D-alanyl ligase 


ORFQ0870 


SAOn7RQ 


4HR 

*tuo 


oxalate .Torm ate antiporter 


ORF00871 


SAnn77n 

onvju t fx/ 




wonserveu nypoineiicai protein 


ORF00872 

%^l »l WWW • 


SAG0771 

\jA\JU lit 


w 1^ 


ceii wan sunace ancnor lamny protein 


ORF00873 

>••»» % 1 WWW f w 


SAH0779 


^1 A 
O l*f 


pepuoe cnam release ractor o 


ORF00874 


SAC5077S 




conservea nypoineiicai protein 


ORF00876 


SAG0774 


244 


ABC transporter, ATP-binding protein 


UrxrUUO / 0 


OAOU/ f O 




ABC transporter, permease protein 


ORFnnft7Q 

UrxrUUO / y 






lipoprotein.putative 


ORF00880 


SAG0777 


528 


ATP-dependent RNA helicase, DEAD/DEAH box 
family 


ORF00RR9 


OAOU / f O 


oo 


conserved hypothetical protein 


ORFOnAR^ 

\Jr\rUUOQO 


QAAH77Q 




conserved hypothetical protein j 


ORF00884 


SAG0780 


246 


acvltransferase familv nrntpin 


ORF00885 


SAG0781 


217 


competence protein CelA 


ORF00887 


SAG0782 


745 


DNA intemalization-related competence protein 
ComEC/Rec2 


ORF00888 


SAG0783 


269 


hydrolase, haloacld dehalogenase-like family 


ORF00889 


SAG0784 


314 


sugar-binding transcriptional regulator, Lacl family 


ORF00890 


SAG0785 


330 


conserved hypothetical protein 


ORF00891 


SAG0786 


242 


conserved domain protein 


ORF00892 


SAG0787 


345 


DNA polymerase III, delta subunit, putativeD 
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Tabl 32^onversion fORFRefN s. with SAG Ref Nos. 



ORFRefN . | 


SAGxxxxR fN . 


aa 


Ann tati n 


ORF00893 


SAG0788 


202 


superoxide dismutase Fe-Mn 


ORF00894 


^ SAG0789 


283 


transcriptional antiterminator UcT 


ORF00895 


SAG0790 


622 


r~ i o sybiem, oeia-giucosiaes-specinc HAdO 
components 


ORF00896 


SAG0791 


475 


6-DhosDho-beta-alucnsida^p 


| ORF00898 


SAG0792 


364 


conserved hvoothetical nrntpin 


ORF00899 


SAG0793 


380 


conserved hvnothptinal nrntpin TinRnnnd^ s 


ORF00900 


SAG0794 


418 


permease. GntP familv 1 


ORF00902 


SAG0795 


354 


conserved hvoothetical nrntpin * 


ORF00903 ! 


SAG0796 


147 


transcrinttonal rpmilatnr MarR f«mil\/ 
m mi iwwi ipuui ioi i cyuiaiui t iviai r> iciiiiiiy 


ORF0G904 


SAG0797 


342 


S-adenosvlmethioninP'tRMA rihncvltrancforaep- 

isomerase 


ORF00905 


SAG0798 


226 


membrane protein, putative 


ORF00906 


SAG0799 


233 


gIucosarnine-6-phosphate isomerase 


ORF00907 


SAG0800 


318 


Glutathione S-transferases domain protein 


ORF00908 


SAG0801 


239 


ribosomal small subunit oseudouridine svnthasa 


ORF00909 


SAG0802 


38 


hypothetical protein 


ORF00910 


SAG0803 


383 


maior facilitator familv orotein 


ORF00911 


SAG0804 


315 


competence protein CoiA 


ORF00912 


SAG0805 


601 


oligoendopeptidase B 


| ORF00913 


SAG0806 


208 


hydrolase, haloacid dehaloaenase-likp familv 


ORF00914 


SAG0807 


235 


O- methyl transferase family protein 


ORF00916 


SAG0808 


309 


protease maturation protein, putative 


ORF00918 


SAG0809 


161 


conserved hypothetical protein 


ORF00919 


SAG0810 


872 


dlanvl-tRNA (iYrnthptaQp 1 


ORF00921 


SAG0811 


238 


membrane protein, putative 


ORF00922 






giycosyi transierase, family o 


ORF00923 


SAG0813 


81 


hypothetical protein 




oAuUo 14 


95 


conserved domain protein 


ur\ruv/9£9 


oAoUolO 


71 


transcriptional regulator, Cro/CI family 


[ ORF00926 


SAG0816 


253 


conserved hypothetical protein 


UKrUU»<df 


SAG0817 


187 


conserved hypothetical protein 


ORF00928 


SAG0818 


319 


ribonucleoside-diphosphate reductase 2, beta subunit 


UKruuyzy 


SAG0819 


719 


ribonucleoside-diphosphate reductase 2, alpha subunit 




SAG0820 


74 


ribonucleoside-diphosphate reductase 2, NrdH-redoxin 


ORF00931 


Q A fin AO 1 


Of 


phosphocarrier protein HPr 


wrxi uu9wA 




Off 


phosphoenolpyruvate-protein phosphotransferase 


ORF00933 


SAG0823 




giyceraiaenyae-o-pnospnaie aenyarogenase, IMADP- 
dependent 


[ ORF00934 


SAG0824 


\ 417 


DOlvsaccharide deacetvlasa familv nrntpin 


ORF00935 


SAG0825 


360 


ATP-deoendent RNA helicase DFAD/nFAH hnv i 
family 


ORF00936 


SAG0826 


209 


uridine kinase 


ORF00937 


SAG0827 


165 


conserved hypothetical protein 


ORF00938 


SAG0828 


554 


ONA polymerase ill, gamma and tau subunits 


ORF00939 


^ SAG0829 


64 


conserved hypothetical protein 


| ORF00940 


SAG0830 


311 


biotin-acetyl-CoA-carboxylase Ngas 


! ORF00941 


SAG0831 


398 


S-adenosylmethionine synthetase 


ORF00942 


SAG0832 


753 


hypothetical protein 


ORF00943 


SAG0833 


181 


hypothetical protein 


ORF00944 


SAG0834 


42 


hypothetical protein 


ORF00945 


SAG0835 


188 


conserved hypothetical protein 
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Table 32: C nv rsionofORFR f Nos. with SAG R fNos. 



ORFRefN . 


SAGxxxx Ref No 


do 


Miinuuiuwii 


ORF00946 


SAG0836 


184 i 


COnsen/ed hVDOthfttirral nrntoin 


ORF00948 


SAG0837 


428 


ABC ttanSQOrter ATP-hinHinn nrntoin i 

uaiiopuuci, r\ \ r-uinuing protein 


ORF00950 


SAG0838 


233 


hypothetical protein 


ORF00951 


SAG0839 


226 


t rariQprintinnal rf*milatnr Ton A familu 

II ai lou ipiiuri lai > C7VjUIC*lLH , 1 fcZMM lollllly 


ORF00952 


SAG0840 


265 




ORF00953 , 


SAG0841 


256 


hvdroxvethvlthiaznlf* WinacA 


ORF00954 


SAG0842 


223 


u iiai i iii ic pi iuopi talc py r opnospnoiy 1356 


ORF00955 


SAG0843 


419 


uur |>i awsiyiaiucosamine i -carooxyviny itransterase 


ORF00956 


SAG0844 


184 


acetyl transferase, GNAT family 


| ORF00957 


SAG0845 


427 


CBS domain protein 


ORFQ0958 


SAG0846 


286 


methionine aminopeptidase, type 1 


ORF00959 


SAG0847 


306 


iiuuMUuicaot; dim, puiallvc 


! ORF00961 


SAG0848 


151 


vjun idiiiuy piuicin 


ORF00962 


SAG0849 


1 U9 


wuiioeivcQ nypoineucai proxein 


I ORF00963 


SAG085O 




ljimav ngdse, iNMu-aepenaeni 


ORF00964 


SAG0851 


OOi7 


Diiii u protein, puiauve 


ORF00966 




i r OD 


puuuianase, puiauve 


ORF00967 


SAG085^ 




i /r-aipna-giucan urancning enzyme 


ORF00968 


SAGQR54 


^7Q 

Of 9 


giucose- 1 -pnospnaie aaenyiy itransterase 


ORF00969 






glycogen Diosyntnesis protein <5lgD rKAMESHIFT 


ORF00971 


SAG0856 


476 


olvcooen synthase * 


I ORF00972 


SAG0857 


66 


ATP synthase F0, C subunit 


ORF00973 


SAG0858 




m i n synmase ru, m suounit 


ORF00974 


SAG0859 


1 ou 


ATD ciinihoea Cn R ni ihi mil 

r\ i r synuiase ru, o suounii 


ORF00975 


SAG0860 


17ft 

I/O 


ATP cunthfiea CI Half's »nhimlt 

Mir* synuiase n i , aeiia suDunii 


ORF00976 


SAG0861 


501 


r\ i r~ oyi unase n, aipna suDunil 


ORF00977 


SAG0862 


293 


ATP cunthoc c» t*1 norm mo ci 1K1 inH 

ni i oyiiiiiciSB ■ i, gain ma suounii 


ORF00978 


SAG0863 


468 


r\l f ayiHIldoc il, L/C la SUDUnll 


ORF00979 


SAG0864 


137 


ATP Ci/nthoco Ci ancilnn 01 it^i mil 

/A 1 1 oyillflaoe Pi , c pall OH SUDUnll 


ORF00980 


SAG0665 


76 


f*r\r\c&r\tari hvinnth patina 1 nrntoin 
wuiigcivcu nypuu iCUUal piuiclll 


ORF00981 


SAG0866 


423 


uur M*ciuciyigiuvo«afTiirie i -car uoxyviny iiransierase 


ORF00982 


SAG0867 


63 


conserved hypothetical protein 


ORF00983 


SAG0868 


285 


DNA-entry nuclease 


ORF00984 


SAG0869 


346 


phenylalanyMRNA synthetase, alpha subunit 


ORF00985 


SAG0870 


I 173 


o^dyiuoiioiciciots! v7iNr\ 1 idiiiuy 


ORF00986 


SAG0871 


801 


phenylalanyl-tRNA synthetase, beta subunit 


ORF00987 






conservea nypotneticai protein 


ORF00988 


SAG0873 


1077 


exonuclease RexB 


ORF00989 


SAG0874 


1207 


exonudease RexA 


tjKruuyyu 


SAG0875 


305 


magnesium transporter, CorA family, putative 


ORF00991 


SAG0876 


458 


tRNA modification GTPase TrmE 


ORF00992 


SAG0877 


636 


ABC transporter, ATP-binding protein 


ORF00993 


SAG0878 


322 


acetoin dehydrogenase, thymine PPi dependent, E1 
component, aipna suuunii 


ORF00994 


SAG0879 


332 


acetoin dehydrogenase, thymine PPi dependent, E1 
component, beta subunit 


ORF00995 


SAG0880 


462 


acetoin dehydrogenase, thymine PPi dependent, E2 
component, dihydroiipoamide acetyltransferase 


ORF00996 


SAG0881 


585 


acetoin dehydrogenase, thymine PPi dependent, E3 
component, dihydroiipoamide dehydrogenase 


ORF00997 


SAG0882 


329 


lipoate-protein ligase A 


ORF00998 


SAG0883 1 


261 


cobyric acid synthase, putative 
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mversion of ORF R f Nos. with SA^R f Nos. 



ORFRefN . 


SAGxxxx Ref No. 


aa 


Annotation 


(JKrOU99y 


oAoOao4 


447 


mur Rgase family protein 


(JKrUiUuQ 


CAr , nooc 

oAbUoob 


283 


conserved hypothetical protein TIGR00159 


ORrOI 001 


oAGOooo 


319 


Gram-positive signal peptide, YSIRK family domain 
jroiein 


ORFfll nn9 

vr\rw 1 


onuuoo/ 




pnospnogiucomutase/pnospnomannomutase tamiiy 

Drotein 


ORF01003 


SAG0888 ' 


123 1 


conserved hvnnthetircil nrntpin 


ORF01004 


SAG0889 

KfJXXm^F W WWW 


126 




ORF01005 


SAG0890 

V/#»V«* WWW W 


376 i 


nwnpn-inripnAnHpnt mnrnnnrnhurlnnnon III rtviHaco 
WA jy Cl ru ,WC H°' IUDIH \AJyti JJI lj 1 If lUljOl 1 III UAIUaac, 

mutative 


ORF01006 


SAG0891 


245 


conserved hypothetical protein 


ORF01007 


SAG0892 


256 


hydrolase, haloacid dehalogenase-like family 


ORF01008 


SAG0893 


218 


conserved hypothetical protein 


ORF01009 


SAG0894 


1370 


conserved hypothetical protein 


ORF01010 


SAG0895 


289 


Hpoyl-binding domain protein 


ORF01011 


SAG0896 


108 


oxidoreductase, putative 


ORF01012 


SAG0897 


221 


conserved hvoothetical Drotein 


ORF01013 


SAG0898 


83 


hypothetical protein 


ORF01014 


SAG0899 


57 


hvDOthetical orotein 


ORF01015 


SAG0900 


56 


hypothetical protein 


ORF01016 


SAG0901 


127 


hypothetical Drotein 

1 If WWU IwUWMI fJ* Vlvll 1 


ORF01018 


SAG0902 


45 


hvoothetical nrotein 


ORF01019 


SAG0903 


44 


hvnothetical Drotein 


ORF01021 


SAG0904 


56 


hvoothetical Drotein 


ORF01022 


SAG0905 


138 


nucleoside diphosphate kinase 


ORF01023 


SAG0906 


610 


GTP-binding protein LepA j 


ORF01024 


SAG0907 


877 


streptococcal histidine triad family protein 


ORF01025 


SAG0908 


203 


HD domain protein 


ORF01026 


SAG0909 


154 


acetyltransferase, GNAT family 


ORF01027 


SAG0910 


144 


Pi IB-related protein 


ORF01030 


SAG0911 


930 


cation-transporting ATPase, E1-E2 family 


ORF01031 


SAG0912 


367 


nucleoside diphosphate kinase domain protein 


ORF01032 


SAG0913 


212 


chloramphenicol acetyltransferase 


ORF01033 


SAG0914 


203 


conserved hypothetical protein 


ORF01034 


SAG0915 


405 


Tn916, transposase 


ORF01035 


SAG0916 


67 


Tn916, excisionase 


ORF01037 


SAG0918 


76 


Tn916, hypothetical protein 


ORF01038 


| SAG0919 


157 


Tn916 v hypothetical protein 


ORF01039 


SAG0921 


117 


Tn916, transcriptional regulator, putative 


ORF01040 


SAG0923 


639 


Tn916, tetracycline resistance protein 


ORF01041 


SAG0925 


310 


Tn916, hypothetical protein 


ORF01042 


SAG0926 


333 


Tn916, NLP/P60 family protein 


ORF01044 


SAG0927 


725 


Tn916, hypothetical protein FRAMESHIFT 


ORF01047 


SAG0928 




Tn916 hvoothetical Drotein FRAMESHIFT 


ORF01048 


SAG0929 


168 


Tn916 hvoothetical Drotein 


ORF01049 


SAG0930 


165 


Tn916, hypothetical protein 


ORF01050 


SAG0931 


73 


Tn916, hypothetical protein 


ORF01051 


SAG0932 


401 


Tn916, transcriptional regulator, putative 


ORF01052 


SAG0933 


461 


Tn916, FtsK/SpolllE family protein 


ORF01053 


SAG0934 


128 


Tn916, hypothetical protein 


ORF01054 


SAG0935 


104 


Tn916, hypothetical protein 


ORF01056 


SAG0937 




ABC transporter, ATP-bindlng protein, FRAMESHIFT 


ORF01057 


SAG0938 


122 


transcriptional regulator, GntR family 


ORF01058 


SAG0939 


1034 


DNA polymerase 111, alpha subunit 
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Tabl 32:Conv rsi n of ORF Ref Nos. with SAG Ref Nos. 



ORF R f N . 


SAGxxxx R f No. 


33 


Ann tati n 

m III Mh^Mmm mm 


ORF01059 


SAQ0940 


340 


6-ohos Dh ofructokin asi* 


j ORF01060 


SAG0941 


500 


pyruvate kinase 


ORF01061 


SAG0942 i 


I03 


signal pcpuoase i, putative 


" ORF01062 


SAG0Q43 


A7 


nypouiBiicai protein 


ORF01063 


SAG0Q44 




glucosamine— mjctose-o-pnospnate aminotransferase 
fisnmerizina^ 


ORF01064 


SAG0945 i 


377 


IS1548 transrjosase 


ORF01066 


SAG0946 


109 


ohnA orotein 1 

11 1/ V pi vlvll 1 


ORF01068 


SAG0947 


213 


on in is/ mvim now iicnioj^wiiwi | |jcriiiccjSc proisin 


ORF01069 


SAG0948 


209 


amino acid ABC transporter, ATP-binding protein 


ORF01070 


SAGOQ4Q 


^/ O 


amino acia ad v transporter, ammo acid-binding 
protein 


ORF01072 


SAGOQRn ! 




riDosomai protein ozu 


ORF01073 




oUO 


pantotnenate Kinase 


ORF01074 




IgO 


conserved hypothetical protein 


ORF01075 


OnOUSOO 


10Q 


cytidine deaminase 


ORF01076 




o4y 


lipoprotein 


ORF01077 


OMV3UwOO 


on 


sugar ABC transporter, ATP-binding protein 


ORF01078 

Will \J 1 W f 






sugar ABC transporter, permease protein, putative 


ORF01079 


SAG0957 


w IO 


sugar mdu transporter, permease protein, putative > 


ORF01080 


SAG0958 


456 


NADH oxidase 


ORF01081 


SAG0959 


329 


L- lactate dehvdroaenase 


ORF01082 


SAG0960 


819 


DNA avrase. A subunit 

9 w » * fc*W W | # * W M W Willi 


ORF01083 


SAG0961 


247 


sortase SrtA 


I ORF01084 


I SAG0962 


137 


alvoxvlase familv orotein 


ORF01085 


SAG0963 


320 


conserved hvDothetical orotein 

VV«W 1 1 J ^/ V U II WMl pi VLWlll 


ORF01086 


SAG0964 


375 


Na+/H+ exchanaer familv Drotein 


ORF01087 


SAG0965 


127 


IS1381, transposase OrfA 


ORF01088 


SAG0966 


129 


IS1381, transposase OrfB 


ORF01089 


| SAG0967 


520 


GMP synthase 


ORF01090 


! SAG0968 


232 


transcriptional regulator, GntR family 


ORF01091 


SAG0969 


444 


gid protein { 


ORF01092 


j SAG0970 


247 


acetvltransf erase GNIAT familv/ 


i ORF01093 


SAG0971 


282 


lipoprotein.putative 


ORF01095 






conservea nypotneticai protein, rHAivlcSHirT 


ORF01096 


SAGOQ7^ 

unwUS 1 w 




nisin-resistance protein, putative 


ORF01097 


SAG0974 


&WW 


«di» transporter, m i r-Dinaing protein 


ORF01098 


SAG0975 


651 


mow iransponer, permease protein, putative 


ORF01099 


SAG0976 




LJiNM-uinamg response regulator 


ORF01100 


unvjuj / # 


Ol£ 


sensor nistiaine Kinase 


ORF01101 


SAG0978 




eitA_AMA/t!fSA rA^ArvskiM/ipn mI«««mm <M4A«aB>Aa«M r " 

sixe-specinc recomDinase, pnage integrase tarruly 


ORF01102 


j SAG0979 


553 


ABC transporter, substrate binding protein, putative 


ORF01103 


! SAG0980 


257 


conserved hypothetical protein 


ORF01104 


! SAG0981 


228 


SatD 


ORF01106 


SAG0982 


521 


signal recognition particle protein 


ORF01108 


SAG0983 


110 


conserved hypothetical protein 


ORF01109 


! SAG0984 


437 


sensor histidine kinase CiaH 


ORF01110 


SAG0985 


226 


DNA-binding response regulator CiaR 


ORF01111 


SAG0986 


849 


aminopeptidase N 


ORF01112 


SAG0987 


217 


phosphate transport system regulatory protein Pholl 
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Table 32: 



Conversion of ORF Ref N s. with SAG R f Nos. 



j ORF Ref No. 


SAGxxxx R fN . 


aa 


Ann tation 


ORP01 113 


SAG0988 


252 


phosphate ABC transporter, ATP-binding protein PstB, 
putative 


ORF01 114 




2o7 


phosphate ABC transporter, ATP-binding protein PstB. 
putative 


ORF01115 


SAQ0990 


295 


phosphate ABC transporter, permease protein PstA, 
putative 


ORF01116 


SAG0991 


Ou3 


pnospnaie aolt transporter, permease protein 


ORF01117 


SAG0QQ9 




pnospnate abo transporter, phosphate-binding protein 


ORF01118 


SAG0993 


436 


NOLI/NOPO/enn fa mil v nrnfoin — * 
mvl i/iiwr£/9Uii laiiiuy jJiumiii 


ORF01119 


SAG0994 


254 


inositol mnnnnhncnhataco -fa milt/ nr/>talr« 
it ivoilwi i iivii iujji luapi id Loot? 1 all Illy protein 


ORF01120 


SAG0995 


93 


consfirvpri hvnnthptfnal nmtoin 
wuiiocivcu i ly jjuu icutxij yji yj icu i m 


ORF01121 


SAG0996 


137 


vuiigcfvcu iiyfiwti iciiudl pivimiJI 


ORF01122 


SAG0997 


310 


iiiouiuuuo-ctiiiuA piuicin rnrc/vnuoiiavin Diosyntnesis 
protein RibF 


ORF01123 


SAG0998 


294 


tRNA pseudouridlne synthase B 


ORF01124 


SAG0999 


[ 143 


aociyiu ansieiaSS, oivm 1 Tsmiiy 


ORF01125 


SAG1000 


! 423 


conserved hypothetical protein 


ORF01126 


OnO 1 \J\J 1 




conserved hypothetical protein 


j ORF01127 


SAG1002 


292 


protease, putative 




SAG 1003 


876 


permease, putative 


ORF01129 


SAG1004 


233 


ABC transporter. ATP-binding protein 


ORF01131 


SAG1005 


| 706 


DNA topoisomerase 1 


ORF01132 


SAG1006 


280 


DprA/SMF protein, putative DNA processing factor 


ORF01133 


SAG1007 


342 


iron-compound ABC transporter, iron-compound- 
binding protein 


UKruno4 


5AG1008 


253 


iron compound ABC transporter, ATP-binding protein 




OnU lUUa 




— 

iron compound ABC transporter, permease protein 


ORF01136 


SAG1010 


320 


iron compounu nDL transporter, permease protein 


ORF01137 


SAG1011 


182 


acetvltransferase CvsE/LacA/LDxA/NodL familv 

*J " M »w» wp mww, vj>;uk.uuruupArvi<iV/UL 1 CI 1 1 Illy 


ORF01138 


SAG1012 


253 


ribonuclease HII 


ORF01139 


SAG1013 


j 283 


GTP-binding protein 


| ORF01140 


SAG1014 


190 


conserved hypothetical protein 


| ORF01142 


SAG1015 


494 


carbon starvation protein CstA, putative 


! ORF01143 


SAG1016 


244 


response regulator 


ORF01144 


SAG1017 


579 


sensor histidine kinase, putative 


I ORF01145 


SAG1018 


40 


hypothetical protein 


ORF01146 


| SAG1019 


39 


conserved hypothetical protein, FRAMESHIFT 


ORF01148 


SAG1020 


227 


hypothetical protein 


ORF01149 


SAG1021 


107 


hypothetical protein 


ORF01150 


SAG1022 


177 


hypothetical protein 


ORF01151 


SAG1023 


48 


hypothetical protein 


ORF01152 


I SAG1024 


183 


hypothetical protein 


ORF01153 


SAG1025 


149 


hypothetical protein 


ORF01156 


SAG1026 




immunogenic secreted protein, DEGENERATE 


ORF01157 


SAG1027 


84 


conserved hypothetical protein 


ORF01158 


SAG1028 


196 


hypothetical protein 


ORF01159 


SAG1029 


101 


hypothetical protein 


ORF01160 


SAG1030 


304 


conserved hypothetical protein 


ORF01161 


SAG1031 


120 


extracellular protein, putative POINT MUATION 


| ORF01162 


SAG1032 


85 


conserved hypothetical protein 


[ ORF01164 


SAG1033 


1309 


FtsK/SpolllE family protein 


ORF01166 


SAG1034 


55 


hypothetical prot In 
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Table 32: Conv rsion fORFRefN s. with SAG R fNos. 



ORF Ref N . 


SAGxxxxR fNo. 


aa 


Annotation 

mm0mmm w, wji ■ 


ORF01167 


SAG1035 


424 


conserved hypothetical orotein 


ORF01168 


SAG1036 


80 


conserved hypothetical Drat pin 

WWI tWWI WWW 1 ' / f* w M fWUWQI WlvlCII 1 


ORF01169 


SAG1037 


157 


hypothetical protein 


ORF01172 


SAG1038 


1003 


phage infection protein, putative 


ORF01173 


SAG1039 


96 


conserved hypothetical orotein l 


ORF01174 


SAG1040 


260 


conserved domain orotein 

WWf IV Wl W WW W WJ • iuii t V wiwll 1 


ORF01175 


SAG1041 


107 


hvootheticat Drotein 


ORF01176 


SAG1042 


1060 


carbamovl-Dhosohate svnthasp lamp ^uhnnit 


ORF01177 


SAG1043 


356 


carbamovl-Dhosohate *>\/ntha^p» <%mall Qnhnnit 

vol waiiiwjrpi ivwpi iwio vyilUiaoC, ollldll OUL/UIIJl 


ORF01178 


SAG1044 


307 


asoartate carbamovltran^fpraQP 


ORF01179 


SAG1045 


430 


ftihvrimomta^P milltifiinctinnol rrtmnlAv h/no 
uiiiju>uuiuia«W| t 1 luiuiui IwUwl idl wwill}JlwA lyjjc 


ORF01180 


SAG 1046 


209 


o rotate nhosnhorihosvltrpinQfpraQP 

VI vlMkw pi iwopi IWI IUvvJ IU Ql IOI v?l CI WW 


ORF01181 


SAG1047 


233 


orotidlne 5*-ohosohatp dprarhnrolacp 

wiwuuiiiw w pt iwvpi *c* io vjc^di uuaj laocf 


ORF01182 


SAG1048 


410 


mpmhrann nrntpin niitatiuo 

I MUI ai IC7 JJI UlCII l t fJUldllVC? 


| ORF01183 


SAG1049 


513 


ABO tran^nnrtpr ATP-hinrlinn nrntpin 

/»u w ii ai ici | o I i ~uiliuiliy pivJldll 


ORF01184 


SAG1050 


112 


rihnnuHpnfirip rprfiictPQo truncation 
i iuui luwicuiiuc icuu\#idac, uuiiwdiiuii 


ORF01185 


SAG1051 


358 


aojJal lalc-ocJ I IlalUwTiyuo UwfiyUrugwTlaS6 


ORF01186 


SAG 1052 


47 


Coll uioll ciirfacp Oni^hnr famihi nrntain mti'itnfA 

v»tsii waii ouridvo ancnor lamiiy proiein, puiauve 


ORF01187 


SAG1053 

wnw i www 


30 


hvnnthptinal nrntpin 

1 ly pULI IBLlUal fJIULGlll 


ORF01188 


SAG1054 


531 


wdlUIUJipil 1 oyiiuicidbc 


ORF01189 


SAG1055 

1 www 


556 


finrmsitp— f ptraHv/rirnfnlatp linoco 
i\Ji 1 1 idit?— "it?u di lyui uiuidLc iiydot; 


ORF01190 


SAG1056 

I www 


339 

www 


linnatp-nrntpin linaco A 


ORF01191 


SAG1057 

V^/»V>J 1 WW f 




uuiiocivcu nypoirieiicai protein 


ORF01192 


SAG1058 

v**»v^ i www 


272 


cnncprv/pH h\znnthptir > al nmtoin 

vUllOCIVCU 1 ly fJUM ICllwdl |JlsJlc?tl1 


ORF01193 


SAG1059 

\JJ I www 


110 


yiy \*u ic? ucavayc wyoiciii n pioicin, puiaiive 


ORF01194 


SAG1060 

1 vvv 


328 


UauiCI Idl lUwIlddQw laiilliy pi (Jim It 


ORF01195 


SAG1061 


399 


UAIUUl CUUlf U39B, IIVIIMUII IUII 1 y 


ORF01197 


SAG1062 


282 


linnsitp— nrntpin Mhpop A famitv/ r^rntoin 
njjv/dw?~pi vJiwii i nyooc r\ tdiiuiy fjiuiwifi 


ORF01198 


SAG1063 


228 


flavonrotpin-rplatPfi nrntpin 


ORF01199 


SAG1064 


180 


flavn nrntpin fa mil v nrntpin 
iiavupiuiciii tan my jjiuiciii 


ORF01200 


SAG1085 


190 


membrane orotein mitativp 


ORF01201 


SAG1066 


572 


oho sohoo lucomuta se 

pi luopi iwy lUwwl ■ IUIB3C 


ORF01202 


SAG1067 


178 


IS861 transoosase OrfA 

■ %^W W • | M III lepi/«d«v w» ■ lr\ 


ORF01203 


SAG1068 


277 


IS861 transoosase OrfR 

• www 1 1 u ai lopvwauw v^i i u 


ORF01204 


SAG1069 


65 


hVDOthetical orotein 


j ORF01205 


SAG1070 


577 


ABC transporter ATP-bindina/oermease orotein 

* it/v u hi iw f/wp i*wi | ■ ■ k^lf lull *2#" Mvl 1 1 IwOvw U| Ulvll X 


ORF01206 


SAG1071 


573 


ABC transoorter ATP-bindinn/nermpaQp nrntpin 

» >ww uui iwf/wiH>i| f * ■ * uiiivii i^ipvi MlvOOv l^lwlwJII 


ORF01207 


SAG1072 


200 


conserved hypothetical protein 


ORF01208 


j SAG1073 


325 


conserved hvootheticat oral pin 

wWPiwwiwww: llJpvill&UVQI VI VICII 1 


ORF01209 


SAG1074 


418 


Serine hvdroxvmethvltransferase 

>^w» II Iw 1 *y Wl KJJ\y 1 9 1611 ijf III Cil lu IVl Q9C 


ORF01210 


SAG1075 


183 


Sua5A f ciO/YrdC/YwlC familv orotein 


ORF01211 


SAG1076 


276 


modification methvlase HemK familv 

• ■ »W MM I WW %M W^i 1 1 i IwU lUw 0| I IW| 1 |l \ IdlllllV 


ORF01212 


SAG1077 


359 


peptide chain release factor 1 


ORF01213 


SAG1078 


189 


thymidine kinases 


ORF01214 


SAG1079 


60 


4-oxalocrotonate tautomerase 


ORF01215 


SAG1080 


47 


hypothetical protein 


ORF01216 


! SAG1081 


312 


ApbE family protein 


ORF01217 


SAG1082 


200 


conserved hypothetical protein 


ORF01218 


SAG1083 


411 


conserved hypothetical protein 


ORF01219 


SAG1084 


262 


formate/nitrite transporter family protein 


ORF01220 


SAG1085 


424 


xanthine permeas ] 


ORF01221 


SAG1086 


193 


xanthine phosphoribosyltransferase 


ORF01222 


1 SAG1087 


327 


guanosine monophosphate reductase 


ORF01223 


SAG1088 


446 


drug resistance transporter, EmrB/QacA family, 
putative 
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mversion of ORF Ref Nos. with SAG R fNos. 



ORF Ref No 


S AGxxxx Ref N 


act 


r\liiivlaUDn 


ORF01224 


SAG1089 




conservea nypoineiicai protein 


ORF01225 


SAG1090 


666 


puidSwiuin upraise protein, putative 


ORF01226 


SAG1 091 


216 


uxiaoieuuuase, snort cnain dehydrogenase/reductase 
familv FRAMESHiFT 


ORF01227 


SAG1092 


330 


Dhosohate acetvltran«»forsa<:e 


ORF01228 


SAG1093 


294 


• 'uuoyi i iwi iai yc ouuuiiu powuuuuriuiriw syninase, r\iUL/ 
subfamily 


ORF01229 


SAG1094 


278 


conserved hypothetical protein 


ORF01230 


SAG1095 


223 


GTP pyrophosphokinase family protein 


ORF01231 


SAG1096 


190 


conserved hypothetical protein 


ORF01232 


SAG1097 


324 


ribose-phosphate pyrophosphokinase 


ORF01233 


SAG1098 


371 


cysteine desulphurase 


! ORF01234 


SAG1099 


115 


conserved hypothetical protein 


| ORF01235 


SAG1100 


210 


HM A-hinriinn nrntoin 


ORF01236 


SAG1101 


226 


DNA repair protein RadC 


GRF01237 


OA\\3 I I 


177 
Of f 


membrane protein, putative 


| ORF01238 


SAG1103 


478 


6-phospho-beta-glucosidase 


UKruizoy 


oAo1104 


204 


platelet activating factor, putative 


ORF01240 


SAG1105 


i 273 


hydrolase, haloacid dehalogenase-like family 


ORF01241 


SAG1106 


309 


transcriptional regulator, AraC family, putative 


ORF01242 


SAG1107 


510 


voltage-gated chloride channel family protein 


| ORF01243 


SAG1108 


357 


spermidine/putrescine ABC transporter, 
spermidine/putrescine-binding protein 


ORF01244 


SAG1109 


258 


spermidine/putrescine ABC transporter, permease 
protein 


ORF01245 


SAG1110 


264 


spermidine/putrescine ABC transporter, permease 
protein 


ORF0124O 


SAG1111 


384 


spermidine/putrescine ABC transporter, ATP-binding 
protein 






300 


UDP-N-acetylenolpyruvoylglucosamlne reductase 


ORF01248 


SAG1 1 13 




-i-amino-^nyaroxy^>nyaroxymetnyiatnyaroptendine 
DvroDhosDhokina^f* 

* wwi iwapi i w fx ii laov 


ORF01249 


SAG1114 


120 


dihydroneopterin aldolase 


ORF01250 


SAG1115 


267 


dihydropteroate synthase 


ORF01251 


SAG1116 


187 


GTP cyclohydrolase 1 


ORF01252 


SAG1117 


420 


folylpolyglutamate synthase 


ORF01253 


SAG1118 


295 


rarD protein 


ORF01254 


SAG1119 


288 


homoserine kinase 


| ORF01255 


SAG1120 


427 


homoserine dehydrogenase 


ORF01256 


SAG1121 


295 


polysaccharide deacetylase family protein 


ORF01257 


SAG1122 


515 


transporter, BCCT family protein 


| ORF01258 


SAG1123 


34 


hypothetical protein 


ORF01259 


SAG1124 


458 


aldehyde dehydrogenase family protein 


ORF01260 


SAG1125 


335 


membrane protein 


ORF01261 


SAG1126 


228 


conserved hypothetical protein 


ORF01262 


SAG1127 


113 


conserved hypothetical protein, FRAMESHIFT 


ORF01263 




187 


hypothetical protein 


I ORF01264 


SAG1128 


65 


transcriptional regulator, Cro/Ct family 


| ORF01265 


SAG1129 


36 


hypothetical protein 


ORF01266 


SAG1130 


49 


hypothetical protein 


ORF01268 


SAG1131 


164 


thiol peroxidase 


ORF01269 


SAG1132 


219 


conserved hypothetical protein 


ORF01272 


SAG1133 


254 


conserved hypothetical protein 
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Table 32: Conv rsion of ORF Ref Nos. with SAG Ref Nos. 



ORF Ref No. 


SAGxxxx R f No 


aa 

da 


Ann tat ion 

r\.% 111 UIUUII 


ORF01273 i 


SAG1134 


213 


trfin^i^rintinrml r&m il ofnr f^ntO familw/nnioecim im 

uai lOMipuunai •cyuioiDr, vjnir\ lamNy/poiassiourn 
uptake protein. TrkA familv 


ORF01274 


SAG1135 


183 


gls24 protein, putative 


ORF01275 


SAG1136 




conserved hvoothetical orotein FRAMESH1FT 

" 9 J f \1%*0KM* fd* • \* K\+ II Iff villi J III 1 


ORF01276 


SAG1137 


180 


gls24 protein, putative 


ORF01277 


SAG1138 


64 


conserved hvoothetical orotein 


ORF01279 


SAG1139 


193 i 


conserved hvoothetical orotein 


ORF01280 


SAG1140 


82 


conserved hvoothetical d rote in 


ORF01281 


SAG1141 


112 


conserved hypothetical protein 


ORF01282 


SAG1142 


759 


ATP-deoendent DNA helicase PcrA I 


ORF01283 


SAG1143 


100 


conserved hypothetical protein, FRAMESH1FT 


| ORF01284 


SAG1144 


441 


uracil permease 


- ORF01285 


SAG1145 


448 


sodiumralanine svrnoorter familv orotein 


ORF01286 


SAG1146 


411 


cation efflux family protein 


ORF01287 


SAG1147 


130 


conserved hvoothetical orotein 


ORF01288 


SAG1148 


231 


membrane orotein outative 


ORF01289 


SAG1149 


207 


conserved hvoothetical orotein 


ORF01290 


SAG1150 


400 


ribosomal orotein S1 


ORF01291 


SAG1151 


76 


conserved hvoothetical nratein 


| ORF01292 


SAG1152 


340 


hranchpd-chsiin si mi no sicid sininnirancfDraea 
ui ai iui icu i#iiaui allllliuawiu ail III KJUanSlCiBsO 


ORF01294 


SAG1153 


819 


DNA tonoisomera^e IV A <%nhnnit 


ORF01295 


SAG1154 


653 


DNIA toooisomerase IV R ^nhnnit 


ORF01296 


SAG1155 


207 


conserved hvoothetical nroteln TIRRnnft9^ 


ORF01297 


SAG1156 


217 


uraciL-DNA ah/co^vln^e 


ORF01298 


SAG1157 


161 


conserved hvoothetical nrntoin 


ORF01299 


SAG1158 


413 


CMP-N-acetvl neuraminic acid **vnthi=»ta^f> NIonA i 


ORF01300 


SAG1159 


209 


neuD orotein 


ORF01301 


SAG 1160 


384 


UDP-N-acetvlalucosamine-2-enimera^e* Npi if* 


ORF01302 


SAG1161 


341 


N-acetvl neuramlc acid svnthata^a IMetiR 


ORF01303 


SAG1162 


466 


cpsL protein 


ORF01304 


SAG1163 


318 


cpsVK protein 


ORF01305 


SAG1164 


321 


cpsVJ protein 


ORF01306 


SAG1165 


j 327 


cpsVO protein 


ORF01307 


SAG1166 


295 




ORF01308 


SAG1167 


241 


cpsVM protein ] 




OMVJ I 1 DO 


OD*r 


cps vri protein 


ORF01310 


SAG1169 


I 163 


CpsVG 




SAG1 170 


149 


CpsF 




O A /"» A A —*A 

SAG 1171 


462 


CpsE 


UKruioio 


O A A. A\ *>n 

SAG 11 72 


229 


CpsD protein 


ORF01314 


SAG1173 


230 


cpsC protein 


ORF01315 


SAG1174 


243 


capsular polysaccharide biosynthesis protein CpsB 


vJrxrUlO ID 


5AG1175 


485 


capsular polysaccharide biosynthesis protein CpsA 


ORF01317 


SAG1176 


290 


capsular polysaccharide synthesis operon 
transcriptional regulator CpsY 


ORF01318 


I SAG1177 


255 


cpslaS protein 


ORF01319 


SAG1178 


236 


purine nucleoside phosphorylase 


ORF01320 


SAG1179 


418 


voltage-gated chtorid channel family protein, putative 


ORF01321 


SAG1180 


269 


purine nucleoside phosphorylas 


ORF01322 


SAG1181 


135 


arsenate reductase 


ORF01323 


SAG1182 


403 


phosphopentomutase 


ORF01324 


SAG1183 


223 


ribose 5-phosphate isomerase 
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inversion of ORF Ref Nos. with SA^Ref Nos. 



ORF Ref No. 


SAGxxxx Ref No. 


aa 


Annotation 


ORF01326 


SAG1184 


236 


conserved hypothetical protein 


j ORF01327 


SAG1185 


262 


tributyrin esterase 


ORF01328 


SAG1186 


553 


metallo-beta-lactamase superfamity protein 


ORF01329 


SAG1187 


253 


ABC transporter, ATP-binding protein 


ORF01330 


SAG1188 


287 


ABC transporter, permease protein 


| ORF01331 


SAG1189 


334 


conserved hypothetical protein 


ORF01332 


SAG1190 


551 


adherence and virulence protein A 


| ORF01333 


SAG1191 


239 


alpha-acetolactate decarboxylase 


ORF01334 


SAG1192 


560 


acetotactate synthase, catabolic 


ORF01335 


SAG1193 


408 


TPR domain protein 


ORF01336 


SAG1194 


396 


membrane protein 


ORF01337 


SAG1195 


153 


MutT/nudix famijy protein 


ORF01338 


SAG1196 


160 


mutator MutT protein 


ORF01339 


SAG1197 


1072 


hyaluronidase 


ORF01340 


SAG1198 


348 | 


dTDP-glucose 4,6-dehydratase i 


ORF01341 


SAG1199 


197 


dTDP-4-dehydrorhamnose 3,5-epimerase 


ORF01342 


SAG1200 


289 


glucose-1 -phosphate thymidytyltransferase 


ORF01343 


SAG 1201 


367 


iiillllUUIdUdctlo UaIUood, puialive 


ORF01344 


SAG 1202 




rnncon/oH fownnf Hati/^ot nrnfain T 1 f~l D AO/I DC 

ujiiacrveu nypuuicucai protein I iur\UU4oD 


ORF01345 


SAG1203 


227 


ujiiocivcu nypuuiciicai proiein 


ORF01346 


1 SAG 1204 




uriNr\ icpiicciiiun protein unaa, puialive 


ORF01347 


SAG1205 


172 


adenine phosphoribosyltransferase 


HRF01 %AR 


oAbixiUo 


004 


conserved domain protein 


DRFm'UQ 
wnru i 


OnO 1 4&U f 




hypothetical protein 


Ur\rU IOovJ 




TOO 

732 


single-stranded-DNA-specific exonuclease RecJ 


\jrs*\j ■ oj i 






oxidoreductase, short chain dehydrogenase/reductase 

idiiuiy 


ORF01352 


SAG1210 


309 


iiiciciiiu-u^ici-iciuiciii tdoc? bupt?narnny proiem 


ORF01353 


SAG1211 


215 


wji isci vcu i iy|juu it*u\#cu fjiumiit 


ORF01354 


SAG1212 


412 


GTP-blnriinn nrntpin HflX 


ORF01355 


SAG1213 


296 


tRNA delta(2)-isopentenylpyrophosphate transferase 


ORF01356 


SAG1214 


58 


hvoothetical nrntpin 


ORF01357 


SAG1215 


305 


exfoliative fovfn A nutativp 


ORF01358 


SAG1216 


1252 


Duilulanase nutativa 


ORF01361 


SAG1217 




conserved hvnothatical nrntpin FRAfwlP^HIFT 


ORF01362 


SAG1218 


i 194 


conserved hvoothetical n rat pin 


ORF01363 


SAG1219 


468 


oentidase M20/M25/M40 familv 


ORF01364 


SAG1220 


200 


nitroreductase familv nrntpin 


ORF01365 


SAG1221 




olvceronhosnhorvl die«*tp.r nhn^nhnHiPctpra<sp 
putative, POINT MUTATION 


ORF01367 


SAG1222 


593 


excinuclease ABC, C subunit 


ORF01368 


SAG1223 


255 


conserved hvnothetical nrntpin 


ORF01369 


SAG1224 


446 


MATE efflux familv nrnfpin 

ciiiua laiiiuj pi viwii I 


ORF01370 


SAG1225 


136 


conserved hvnothetical nrntpin ■ 


ORF01371 


SAG1226 


i 165 


conserved hypothetical protein 


ORF01372 


SAG1227 


198 


conserved hypothetical protein 


ORF01373 


SAG1228 


96 


ISSdyl, transposase OrfA 


ORF01374 


SAG1229 


259 


ISSdy 1 , transposase OrfB 


ORF01375 


I SAG1230 


| 96 


conserved hypothetical protein 


ORF01377 


SAG1231 




transposase OrfB, 1S3 family, degenerate 
FRAMESHIFT 


ORF01379 


SAG1232 


77 


transposase OrfB, IS3 family, truncation 


ORF01380 


i SAG1233 


822 


streptococcal histidine triad family protein 


ORF01381 


SAG1234 


306 


laminirvblnding surface protein 
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Tab! 32:Conv rsi n fORFRefN . with SA 




T ,032605 
:ef Nos. 



ORF R f N . 


SAGxxxx Ref No. 


aa 


Ann tation ! 


ORF01382 


SAG1235 


425 


GBSil, group II intron, maturase 


ORF01383 


SAG123B 




c5a peptidase precursor FRAMESHIFT 


ORF01384 


SAG1237 


444 


hypothetical protein 


ORF01385 


SAG1238 


202 


hypothetical protein 


ORF01386 


SAG1239 


76 


conserved hypothetical protein 


ORF01387 


SAG1240 


125 


conserved hypothetical protein, truncation 


ORF01388 


SAG1241 


78 


transposase OrfA, IS3 family 


ORF01389 


SAG1242 


67 


transposase OrfB, IS3 family, truncation 


ORF01390 


SAG1243 


96 


ISSdyl, transposase OrfA FRAMESHIFT 


ORF01391 


SAG1244 


259 


ISSdyl , transposase OrfB 


ORF01392 


SAG1245 


38 


hypothetical protein 


ORF01393 


SAG1246 


389 


hypothetical protein 


ORF01394 


SAG1247 


399 


intearase ohaae familv 


ORF01395 


SAG1248 j 


75 


conserved hvnothp»tical nrntein 

Vvl ravl VvU | Ijf pull IGllwal fJl v/lvJII 1 


ORF01396 


SAG1249 ! 


74 


transcriptional rpmilatnr fro/f^l familv 

u at loi/i ipuui tat icyuiolUit wlU/V/l loillHy 


ORF01397 


SAG1250 


621 


Tn5259 relaxant* 


ORF01398 


SAG1251 


121 




ORF01399 


SAG1252 j 


120 




ORF01401 


SAG1253 


435 


tran^no^a^p Ifil "\ familv 

u ai lopuocioc, iulj t at 1 lily 


ORF01403 


SAG1254 


546 


IlldlsUlll* ICUUblddC 


OR F0 1404 


SAG1255 


130 


iiicsiuuiiu icoioiouice operon itsyuidiory protein fvierrx 


ORF01406 


SAG1256 


142 


IS861, transposase OrfB, truncation 


ORF01407 


SAG1257 


709 


cation-transporting ATPase, E1-E2 family 


ORF01408 


SAG1258 


122 


cadmium efflux system accessory protein 


ORF01409 


SAG1259 


99 


conserved hypothetical protein 


ORF01410 


SAG1260 


262 


hypothetical protein 


ORF01411 


SAG1261 


198 


conserved hypothetical protein 


ORF01412 


SAG1262 


695 


cation-transporting ATPase, E1-E2 family 


ORF01414 


SAG1263 




conserved domain protein, FRAMESHIFT 


ORF01415 


SAG1264 


148 


transcriptional repressor CopY, putative 


ORF01416 


SAG1265 


206 


cadmium resistance transporter, putative 


ORF01417 


SAG1266 


152 


hypothetical protein 


ORF01418 


SAG1267 


108 


hypothetical protein 


ORF01419 


SAG1268 


230 


repressor protein, putative 


ORF01420 


| SAG1269 


44 


hypothetical protein 


ORF01421 


SAG1270 


471 


ImpB/MucB/SamB family protein 


ORF01423 


SAG1271 


116 


conserved hypothetical protein 


ORF01424 


SAG1272 


102 


conserved hypothetical protein 


ORF01425 


SAG1273 


118 


conserved hypothetical protein 


ORF01426 


SAG1274 


129 


conserved hypothetical protein 


ORF01427 


SAG1275 


75 


hypothetical protein 


ORF01428 


SAG1276 


358 


conserved hypothetical protein 


ORF01430 


I SAG1277 


163 


hypothetical protein j 


ORF01431 


SAG1278 


96 


hypothetical protein 


ORF01432 


SAG1279 


99 


conserved domain protein 


ORF01433 


SAG 1280 


2274 


Helicases conserved C-terminal domain protein 


ORF01434 


SAG1281 


183 


hypothetical protein 


ORF01435 


! SAG1282 


63 


lipoprotein, putative 


ORF01436 


SAG1283 


1631 


cell wall surface anchor family protein 


ORF01437 


SAG1284 


| 196 


abortiv infection protein AbiGI 


ORF01438 


SAG1285 


281 


abortive infection protein AbiGII 


ORF01439 


| SAG1286 


933 


conserved hypothetical protein 


ORF01440 


SAG1287 


776 


conserved hypothetical protein 


ORF01441 


SAG1288 


117 


conserved hypothetical protein, DEGENERATE 
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Table 32: C nversion of ORF R f N s. with SA^R f Nos. 



ORF Ref No. 


SAGxxxx Ref No. 


aa 


Annotati n 


ORF01442 


SAG1289 


284 


conserved hypothetical nrotein 


| ORF01443 


SAG1290 


80 


hypothetical protein 


ORF01444 | 


SAG1291 


605 


Tn5252, Orf 21 Drotein interna! deletion 


ORF01445 


SAG1292 


162 


hypothetical protein 


ORF01446 


SAG1293 


194 


protease, putative 


ORF01447 j 


SAG1294 


77 


conserved hvDOthetical orotein 


ORF01449 


SAG1295 


127 | 


conserved hypothetical protein 


ORF01450 


SAG 1296 


142 


buiiaci vcu iiypoineiicai proiem 


ORF01451 j 




451 


type ii moamcauon meinyitransTerase opnozoZIP 


ORF01452 


SAG 1298 


31 


i ly jjuli icuLrcii {JiULclii 


ORF01453 


SAG 1299 


070 


conserveu nypoineiicai protein 


ORF01454 


SAG 1300 


*57 


conserveu nypoineiicai protein 


ORF01455 


SAG13Q1 


191 


nooauiTioi protein i<& 


ORF01456 • 


SAG 1302 


IOO 


nhncnmal nrntain 1 4A 1 

iioooumai proiein ltu 


ORF01458 


SAG 130^ 




m i r-aepenaent oip protease, ATP-oinding subunit 


ORF01459 


SAG1304 


32 


hypothetical protein 


ORF01460 






nomocysteine o-metnyitransferase MmuM, putative 


ORF01461 


SAG1306 


j 458 


HfTlinn aplrl nprmoaca 


ORF01463 


SAG1307 


216 


hx/nothotf r^al nrntoin 

1 1 jpuil ICZUUcJI |JIUlt?lll 


ORF01464 


SAG1308 


167 


hvnothptinal nmtpin 


ORF01465 


SAG1309 


30 


h\/nothptir*ijl nrnfoin 
i iy fsuuiciii*ai fjiuidii 


ORF01466 


SAG1310 


182 


Ual IOW ipilUl ICll ICvJUIalUI, 1 elr\ lalTMiy 


ORF01467 


SAG1311 


198 


fiTP-hinriinn nrntpin 


ORF01468 


SAG1312 


408 


ATP~denendpnt f!ln nmtpaCA ATP-hinrlinn cnhiinil 

n,r ucpcuuciu vsijj piuicdijtj, m i r^*uinaing suDunii 
CIpX 


ORF01469 


SAG1313 


56 


conserved hypothetical protein 


ORF01470 


SAG1314 


164 


dihydrofolate reductase 


ORF01471 


SAG1315 


279 


thymidylate synthase 


ORF01472 


SAG1316 


390 


HMG-CoA synthase 


ORF01473 


SAG1317 


427 


3-hydroxy-3-methylglutaryl-CoA reductase 


ORF01474 


SAG1318 


149 


conserved hypothetical protein 


ORF01475 


SAG1319 


187 


hemolysin III, putative 


ORF01476 


SAG1320 


304 


conserved hypothetical protein TIGR00147 


ORF01477 


SAG1321 


284 


glutathione S-transferase family protein 


ORF01478 


SAG1322 


72 


conserved domain protein 


ORF01479 


SAG1323 


331 


isopentenyl-diphosphatedelta-isomerase 


ORF01480 


SAG1324 


330 


phosphomevalonate kinase 


ORF01481 


SAG1325 


314 


diphosphomevalonate decarboxylase 


ORF01482 


SAG1326 


292 


mevatonate kinase, putative 


ORF01483 


SAG1327 


409 


sensor histidine kinase 


ORF01484 


SAG1328 


228 


DNA-binding response regulator 


ORF01485 


SAG1329 


| 208 


GTP pyrophosphokinase family protein 


| ORF01486 


SAG1330 


68 


hypothetical protein 


ORF01488 


SAG1331 


| 979 


R5 protein 


ORF01489 


SAG1332 


146 


transcriptional regulator, MarR family, putative 


ORF01490 


SAG1333 


690 


5*-nudeotidase family protein 


ORF01491 


SAG1334 


136 


polypeptide deformylase, putative 


ORF01492 


SAG1335 


449 


NADP-specific glutamate dehydrogenase 


ORF01494 


SAG1336 


169 


conserved hypothetical protein 


ORF01495 


SAG1337 


589 


ABC transporter, ATP-binding/permease protein 


ORF01496 


SAG1338 


579 


ABC transporter, ATP-binding/permease protein 


ORF01497 


SAG1339 


157 


acetyltransferase, GNAT family 
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Table 32^^ nversion of ORF R f Nos. with SAG Ref Nos. 



ORFR f No. 


SAGxxxx Ref No. 


aa 


Annotation 


ORF01498 


SAG1340 


622 


ABC transporter, ATP-bindina orotein 


ORF01499 


SAG1341 


402 


polyA polymerase family protein 


ORF01500 


SAG1342 


282 


DegV family protein 


ORF01501 


SAG1343 


126 


conserved hypothetical protein 


ORF01502 j 


SAG1344 


177 


hypothetical protein 


ORF01503 


SAG 1345 


164 ! 


COnsen/ed hvnnihptircal nrntoin 


ORF01504 


SAG1346 


641 


r i w> oyoiciu, iiuwlu%>t7 opecinc MrVDVw components 


ORF01505 


SAG1347 


303 


1-nhft<inhnfri irfrklrlnoco 


ORF01506 


SAG1348 


247 


lauusc pi iuopiiuuarioT5iase system repressor 


ORF01507 


SAG1349 


411 


h&ta-lactam ff^Q i nrp far*frr»r 


ORF01508 


SAG1350 


544 


<?nrfanp antinon_r<i1;jtoH nrntoin 


ORF01509 


SAG1351 


307 


9-fiphvrlrnnantoal'O O.roHi mfaca r\ti4a+k#A 1 


ORF01510 


SAG1352 


356 


i c?y wia iuj y piuiclil, puidUVc 


ORF01511 


SAG1353 

W/^W 1 www 


^0 

www 


pynaine nucieotiae-aisuipniae oxiaoreouciase Tamiiy 
protein 


ORF01512 

• w VI V ■ Mm 


SAG 1354 

1 WW*T 


951 
1 


vr\4NM \guanine-i>ii j-meinyiiransterase 


ORF01513 


SAG 1355 

nJ/\V3 1 www 


179 

1/4- 


IRC pDKIA nrAroccinn nmfain DimnJI 

loo i rxiMM processing protein Kirnivi 


ORF01515 




5trc 

www 


uanscnpuonai regulator, kota Tamuy 


ORF01516 


SAG1357 


k 80 


KH domain protein 


UI\rU 191 r 




90 


ribosomai protein S16 


virvru io i o 


oAbl Joy 


415 


permease, putative 




oAV7i obU 


236 


ABC transporter, ATP-binding protein 




bAGIool 


414 


conserved hypothetical protein 


URrU IOmCsC 




532 


carbamoyl-phosphate synthase, large subunit, putative 


ORF01523 


OV^VJ 1 www 


*35fi 


caroarnoyi-pnospnaie syninase, small suounit 


ORF01524 


SAG 1^64 

*Jf\\J i out 


17^ 
I/O 


pynrniQine operon regulatory protein 


ORF01525 


SAG 1365 

wr\\3 1 WWw 




iiuubtjiTidi idrge suuunii pscuaounaine synmase, kiuu 
subfamily 


ORF01526 


SAG1366 


154 


HnoDrotein sional oeotidase 


ORF01527 


SAG1367 


301 


transcriptional regulator, LysR family 


ORF01528 


SAG1368 


94 


ribosomai protein L27 


ORF01529 


SAG1369 


112 


conserved hypothetical protein 


ORF01530 


SAG 1370 


104 


ribosomai protein L21 


ORF01531 


SAG1371 


392 


conserved hypothetical protein 


ORF01532 


SAG1372 


404 


thiamine biosynthesis protein Thil 


ORF01533 


SAG1373 


381 


cysteine desulphurase 


ORF01535 


SAG1374 


150 


conserved hypothetical protein 


ORF01536 


SAG1375 


449 


glutathione reductase 


ORF01537 


SAG1376 


111 


conserved hypothetical protein 


ORF01538 


SAG1377 


388 


chorismate synthase 


ORF01539 


SAG1378 


355 


3-dehydroquinate synthase 


ORF01540 


SAG1379 


225 


3-dehydroquinate dehydratase 


ORF01541 


| SAG1380 


385 


conserved hypothetical protein 


ORF01542 


| SAG1381 


I 714 


sulfatase 


ORF01543 


SAG1382 


119 


ribosomai protein L20 


ORF01544 


SAG1383 


66 


ribosomai protein L35 


ORF01545 


SAG1384 


176 


translation initiation factor IF-3 


ORF01546 


SAG1385 


227 


cytidylate kinase 


ORF01547 


SAG1386 


174 


conserved hypothetical protein 


ORF01548 


SAG1387 


65 


ferredoxin, 4Fe-4S 


ORF01649 


SAG 1388 


| 163 


conserved hypothetical protein 


ORF01550 


SAG1389 


406 


peptidase t 


ORF01551 


SAG1390 


544 


polysaccharide biosynthesis protein, putative 
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nv rsion of ORF R ff Nos. with SA^R f Nos. ' 



ORF Ref No. 


SAGxxxx Ref N . 


aa / 


tan tati n 


ORF01552 


SAG1391 


484 

c 


UDP-N-acetylmuramoylalanyl-D-glutamate--2 l w- 
liaminopimelate ligase 


ORF01553 


SAG1392 


264 


iron compound ABC transporter, ATP-binding protein 


ORF01554 


SAG1393 


310 

I 


iron compound ABC transporter, substrate-binding 

3 rote in j 


ORF01555 






iron compounu ado irans porter, permease protein 


UrvrUi ooo 


OnO i oyo 


ooo 


iron m mnm in ri ABC trans do rter oermease orotein 


ORF01557 

wTNl w Iwwf 


SAG1396 


217 


conserved hypothetical protein 


ORF01 558 

\S 1 XI W 1 www 


SAG1397 


311 


inorganic pyrophosphatase, manganese-dependent 


ORF01559 | 


SAG1398 


262 


pyruvate formate-lyase-activating enzyme 


ORF01560 


SAG1399 


444 


CBS domain protein . i 


ORF01561 


SAG1400 


188 


conserved hypothetical protein 


ORFOI *5fi^ 
wrvru i www 


SAG 1401 I 


311 


conserved hypothetical protein TIGR01212 


ORF01564 
r\i w i ww^ 


SAG1402 


213 


PAP2 family protein 


\a/r\l W I WWW 


SAG1403 

VrW I "T w w 


194 


membrane protein, putative 


ORFOI *5fifi i 

vixrv l www 


SAG 1404 

unu ■ •tut 


308 


cell wall surface anchor family protein 




SAG1405 

OmW 1 "TWW 


294 


sortase familv orotein 




SAG14QB 

\JI\\J 1 *tw w 


293 


sortase familv nrotein 


Ul\rU 1 www 


SAG1407 


705 

t WW 


call wall surface anchor familv orotein 

vwil well P oui muw aiiui iwi iui nut w+ ■ %^»v*i ■ ■ 


ORFni«\7fi 
Ur\rU Of VJ 


otavj i*tuo 


0Q1 

wW 1 


c^ll wall surface anchor familv orotein 

wwll WC1II 9UI IHWW Cll IWl IWI IUI I Hi J p WIVII 1 


ORF01571 


SAG1409 


326 


transcriptional regulator, RofA family FRAMESHIr- r 


OR F0 1*579 


SAG 1410 

OAVJ It IU 


379 


nlvco<5vl transferase orouD 1 familv orotein 


ORFOI *V7^ 
ur\ru i w # w 


SAG1411 

w#VVJ 1*1 1 1 


282 


exooolvsaccharide bios vn thesis orotein. outative 


ORF01*574 


SAG 141 2 


474 


exopolysaccharide biosynthesis protein, putative 


ORF01575 

wrxru i w» w 


SAG1413 


454 


hvoothetical orotein 


ORFOI 576 

wl»l w lw» w 


SAG1414 


308 


glycosyl transferase, group 2 family protein 


ORF01577 

W/IXI w 1 w # f 


SAG1415 


311 


glycosyl transferase, group 2 family protein 


ORF01578 

VlM W 1 Wf W 


SAG1416 


352 


dTDP-glucose 4,6-dehydratase, putative 


ORF01579 


SAG1417 


240 


4-diphosphocytidyl-2C-methyl-D-erythritol synthase, 
putative 


ORF01580 


SAG1418 


259 


HcD protein, putative 


ORF01581 


SAG1419 


577 


hypothetical protein 


ORF01582 


SAG1420 


117 


conserved hypothetical protein 


ORF01583 


SAG1421 


243 


glycosyl transferase, group 2 family protein 


ORF01584 


SAG1422 


313 


glycosyl transferase, group 2 family protein 


ORF01585 


SAG1423 


384 


conserved hypothetical protein 




e AG1 AO A 


4^w*t 


riTOP-^rtehvrlrorharnnose reductase 

U l L«»i I UCl lyui wl 1 la 1 1 II I www i ouuwiaoo 


ORFO.1 *ift7 

Uf\"U 1 wO f 


OAft iA0*i 

wTAV? • *t4.w 


11^ 


conserved hvoothetical orotein 


! ORFfi1*iftQ 




369 


RNA nnlvmerase «;inma-70 factor 


ORFm*5Qn 
UrSru 1 wwU 




WWfc 


HMA nrima^a 

uiNr\ |jiiiiiciow 


ORF01591 
Uf\rw 1 ww 1 


SAG1428 


125 


lame conductance mechanosensitive channel Drotein 


ORF01592 


SAG1429 


58 


ribosomal protein S21 


ORF01593 


SAG1430 


167 


conserved hypothetical protein , 


ORF01594 


SAG1431 


268 


amino acid ABC transporter, amino acid-binding 
protein 


ORF01596 


SAG1432 


347 


ammonium transporter family protein 


ORF01597 


SAG1433 


375 


conserved hypothetical protein 


ORF01598 


SAG1434 


328 


rhodanese family protein 


| ORF01599 


SAG1435 


101 


conserved hypothetical protein 


ORF01600 


SAG1436 


457 


glycerol-3-phosphate transporter, putative 
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Tabl 32Wonv rsion of ORF Ref N s. with SACTtef Nos. 



ORFRefN . 


SAGxxxx Ref No. 


aa 


Annotation 


ORF01601 


SAG1437 


55 


hypothetical protein 


ORF01602 


! SAG1438 


754 


glycogen phosphorylase 


ORF01603 


! SAG1439 


498 


4-alpha-glucanotransferase 


ORF01604 


! SAG1440 


342 


maltose operon repressor MaIR, putative 


ORF01605 


SAG1441 


415 


maltose/maltodextrin ABC transporter, 
maltose/maltodextrin-binding protein 


ORF01606 


SAG1442 


456 


maltose ABC transporter, permease protein 


ORF01607 


SAG1443 


278 


maltose ABC transporter, permease protein | 


| ORF01608 


SAG1444 


490 


proton/peptlde symporter family protein i 


ORF01610 


SAG1445 




MutT/nudix family protein. FRAMESHIFT 


ORF01611 


SAG1446 


62 


hypothetical protein 


ORF01612 


SAG1447 


441 


conserved hypothetical protein 


ORF01613 


! SAG1448 


502 


glycosyl transferase, group 1 family protein 


ORF01614 


SAG1449 


795 


preprotein translocase SecA subunit, putative j 


ORF01615 


SAG1450 


330 


conserved domain protein 


ORF01617 


SAG1451 


494 


conserved hypothetical protein 


ORF01618 


SAG1452 


514 


conserved hypothetical protein 


ORF01619 


SAG1453 


409 


preprotein translocase SecY family protein 


ORF01621 


SAG1454 


398 


conserved hypothetical protein 


ORF01622 


SAG1455 


295 


glycosyl transferase, group 2 family protein 


ORF01623 


SAG1456 


312 


glycosyl transferase, family 8, degenerate 


ORF01624 


SAG1457 


129 


IS1 381 , transposase OrfB j 


ORF01625 


SAG1458 


127 


IS1381, transposase OrfA 


ORF01626 


SAG1459 


413 


qlvcosvl transferase fa mil v a 


ORF01627 


SAG1460 


401 


glycosyl transferase, family 8 


ORF01628 


SAG1461 


335 




ORF01630 


SAG1462 


970 


C^ll W$tll CI irfa#*o anrhnr (amilu nrnUm : 


ORF01632 


SAG1463 




transcrinf tonal r^niilafnr f?nfA fomiiw dhimt 
iioiiouiipiiuiicti i t:y uidior, r\OTM Family KUINI 

MUTATION 


ORF01634 


SAG1464 


663 


excinuclease ABC, B subunit 


ORF01635 


SAG1465 


306 


protease, putative j 


ORF01636 


SAG1466 

_ 


727 


glutamine ABC transDorter alutaminp-hinHinn 
protein/permease protein, putative 


ORF01637 


SAG1467 


246 


glutamine ABC transporter, ATP-binding protein, GlnQ 
putative 


ORF01638 


SAG1468 


116 


conserved hypothetical protein 


ORF01639 i 


SAG1469 


52 


conserved hypothetical protein 


ORF01640 


SAG1470 


437 


GTP-binding protein, GTP1/Obg family 


ORF01641 


SAG1471 


42 


conserved hypothetical protein 


ORF01643 


SAG1472 


413 


aminopeptidase PepS 


ORF01645 


SAG1473 


192 


cell wall surface anchor family protein 


ORF01646 


SAG1474 


680 


amidase family protein 


ORF01647 


SAG1475 


240 


liposomal small subunit pseudouridine synthase A 


ORF01648 


SAG1476 


280 


oxidoreductase, aldo/keto reductase family 


ORF01650 


SAG1477 


224 


nitroreductase family protein 


ORF01651 


0/\bl4/0 


130 


lactoylglutathione lyase 


ORF01652 


SAG1479 


308 


glycosyl transferase, group 2 family protein 


ORF01653 


SAG1480 


462 


amino acid permease 


ORF01654 


SAG1481 


155 


SsrA-binding protein 


ORF01655 


SAG1482 


801 


exoribonuclease, VacB/Rnb family 


ORF01657 


SAG1483 


78 


preprotein translocase, SecG subunit 


ORF01658 


SAG1485 


369 


multi-drug resistance protein 


ORF01660 


SAG1486 


548 


hypothetical protein 


ORF01661 


SAG1487 


233 


ABC transporter, ATP binding protein 
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onversion of ORF R fN s. with SA^R fNos. 



ORFRefN . 


SAGxxxx Ref No. 


aa 


Annotatf n 


ORF01662 


SAG1488 


195 


dephospho-CoA kinase 


ORF01663 


SAG1489 


273 


formamidopyrimidine-DNA glycosylase 


| ORF01665 


SAG1490 


282 


transcriptional regulator, MutR family 


ORF01666 


SAG1491 


530 


hypothetical protein 


ORF01667 


SAG1492 


58 


hypothetical protein 


ORF01668 


SAG1493 




hypothetical protein 


ORF01670 


SAG1494 


32 


hypothetical protein 


ORF01672 


SAG1495 


81 


protease, putative, POINT MUTATION 


ORF01673 


SAG1496 


110 


hypothetical protein 


ORF01674 


SAG1497 


37 


hypothetical protein 


ORF01675 


SAG1498 


133 


hypothetical protein 


ORF01677 


SAG1499 


299 


GTP-bindino Drotein Era 


| ORF01678 


SAG1500 


132 


diacylgtycerol kinase 


ORF01679 


SAG1501 


161 




ORF01680 


SAG1502 


268 


icua^nuinyun poiyKeuue synmesis u- 
methvltransfpra^P TrmP ni itath/^> 


L ORF01681 


SAG1503 


39 


hvDOthetical Drotein I 


ORF01682 


SAG1504 


38 


hvDOthetical Drotein 


ORF01683 


SAG1505 


158 


MlltT/nudix famliv Drntain 


ORF01684 


SAG1506 


267 


hvDOthetical Drotein 


ORF01685 


SAG1507 


345 


PhoH family oratein ! 


ORF01686 


SAG1508 


590 


67 kDa Mvosin-ero^rpartix/p <; tre* ntn rva 1 antinon 

w * twi /won t wiVOOtca^UVC oil c{JlUU4JUv>ctl ell Ilium 1 


ORF01687 


SAG1509 


71 


conserved hypothetical protein 


ORF01688 


SAG1510 


169 


peptide methionine sulfoxide reductase 


ORF01689 


SAG1511 


284 


conserved hypothetical protein 


ORF01690 


SAG1512 


185 


ribosome recycling factor 


ORF01691 


SAG1513 


242 


uridylate kinase 


ORF01692 


SAG1514 


226 


peptide ABC transporter, ATP-binding protein 


ORF01693 


SAG1515 


262 


peptide ABC transporter, ATP-binding protein 


ORF01694 


SAG1516 


255 


peptide ABC transporter, permease protein 


ORF01695 


SAG1517 


314 


peptide ABC transporter, permease protein 


ORF01696 


SAG1518 


525 


peptide ABC transporter, peptide-binding protein 


ORF01697 


SAG1519 


229 


ribosomal protein L1 


ORF01698 


SAG1520 


141 


ribosomal protein L1 1 ; 


ORF01699 


] SAG1521 


388 


transposase, IS30 family, putative 


ORF01700 


SAG1522 


460 


transporter, major facilitator family 


ORF01702 


SAG1523 


404 


peptidase, M20/M25/M40 family 


ORF01703 


SAG1524 


294 


transcriptional regulator, LysR family 


ORF01704 


SAG1525 


117 


iot?i vc5d i lyyjyjii it^uv^cti }j(Ulc?ll) 


ORF01705 


SAG1526 


178 


IS861, transposase OriA 


ORF01706 


SAG1527 


977 


loooi, transposase urru 


ORF01707 


SAG 1528 


571 


cnonsmaie Dinoing enzyme 


ORF01708 


SAG1529 


. 785 


FtsK/SpolllE family protein 




eon 


267 


peptidyl-protyl cis-trans isomerase, cyclophilin-type 


ORF01710 


j SAG1531 


277 


manganese ABC transporter, permease protein 


ORF01711 


SAG1532 


| 238 


manganese ABC transporter, ATP-binding protein 


ORF01712 


| SAG1633 


308 


manganese ABC transporter, manganese-binding 
adhesion (iprotein 


ORF01713 


SAG1534 


215 


iron-dependent transcriptional regulator 


ORF01714 


SAG1535 


229 


5-methylthioadenosine nucleosidase/S- \ 
adenosylhomocysteine nucleosidase 


ORF01715 


SAG1536 


89 


conserved hypothetical protein 


ORF01716 


SAG1537 


184 


MutT/nudix family protein 
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Table 3Z^onv rsi nofORFR f Nos. with SA^^ef Nos. 



ORF Reff No. 


SAGxxxxR fN • 


aa 


Annotati n 


ORF01718 


SAG1538 


459 | 


UDP-N-acetylglucosamine pyrophosphorylase 


ORF01719 


SAG1539 


31 


hypothetical protein 


i ORF01720 


SAG1540 


137 


conserved hypothetical protein 


| ORF01721 


SAG1541 


125 


gtyoxalase family protein 


ORF01722 


SAG1542 


318 


oxidoreductase, Gfo/ldh/MocA family ] 


ORF01724 


SAG1543 




conserved hypothetical protein, FRAMESHIFT 


| ORF01725 


SAG1644 


232 


gluconate 5-dehydrogenase, putative 


ORF01726 


SAG1545 


78 


conserved hypothetical protein 


ORF01727 


SAG1546 


82 


conserved hypothetical protein \ 


ORF01729 


SAG1547 


166 


acetyltransferase, GNAT family j 


ORF01730 


SAG1548 


422 


glycosyi transferase, group 2 family protein 


ORF01731 


SAG1549 


127 


IS1381, transposase OrfA 


ORF01732 


SAG1550 


129 


IS1381, transposase OrfB 


ORF01733 


SAG1551 


67 


hypothetical protein 


ORF01734 


SAG1552 


719 


conserved hypothetical protein 


ORF01735 


SAG1553 


477 


hypothetical protein 


ORF01736 


SAG1554 


225 


hypothetical protein 


ORF01737 


SAG1555 


231 


hypothetical protein 


ORF01738 


SAG1556 


445 


b ranch ed-chain amino acid trans Dort svstem II carrier 
protein 


ORF01739 


SAG1557 


665 


methionyl-tRNA synthetase 


ORF01740 


SAG1558 


291 


tellurite resistance protein TehB 


ORF01741 


SAG 1559 


2^1 


Hid I lUI cal it? pi<Jlt?lll t fJUldllVQ 


ORF01742 


SAG 1560 


40 




ORF01743 


SAG 1561 


405 


r i u oy diet ii wwifi|j\ji id Ji, fjuiciiivc 


ORF01744 


SAG 1562 


280 


WIIOCI VCU liyfJUUlCUOCH |JlwlC«lll 


ORF01745 


SAG1563 


275 


^vririf*rtv\/rihnrii ipJpsiq*» 

CAUUCUAjl IUUI lUwiCOOC 


ORF01746 


SAG 1564 


118 


wwl loCI VCU 1 ly \J\J LI ICL1MII JJI UlCM 1 


ORF01747 


SAG 1565 


158 


mpihvl^tpH-P) Nl A n rn tfri n-z^v ctoino Q-mathultra n cforaca 

ii icu ly fcucu-u-MNr\— |ji tjien i-vyoidi it? O llldl lyiUallolcIaoc 


ORF01748 


SAG1566 


393 


D-isomer specific 2-hydroxyacid dehydrogenase family 
protein 


ORF01749 


SAG1567 


I 182 


acetyltransferase, GNAT family 


ORF01750 


SAG1568 




phosphoserine aminotransferase FRAMESHIFT 


ORF01752 


SAG1569 


211 


copper homeostasis protein CutC, putative 


ORF01753 


SAG1570 


34 


conserved hypothetical protein 


ORF01754 


SAG1571 


53 


hypothetical protein 


ORF01755 


SAG1572 


287 


tetraovrrole msthvlase familv nratoin 


ORF01756 


SAG1573 


108 


conserved hypothetical protein 


ORF01758 

V^l \l w f f WW 


SAG 1574 


C.OI 


lmnm puiymeicise in. ueua prime suuunii, putative 


ORF01759 


SAG1575 


211 


thymidylate kinase 


ORFniTRI 
UnrU 1 (D 1 


OAV313/D 




transposase, IS30 family, putative, truncation 


ORF01763 


SAG1577 


219 


AcuB family protein 


ORrUT 7B4 


SAG1578 


236 


branched-chain amino acid ABC transporter, ATP- 
binding protein 


ORF01765 


SAG1579 


254 


hranchpff-chain Amino siHri ARH tranennrtpr ATP. 

ui ai iuicu*^i iciii I ail III iw OUU nuu u al lafJUl LCI , nir* 

binding protein 


ORF01766 


SAG1580 


317 


branched-chain amino acid ABC transporter, 
permease protein 


ORF01767 


SAG1581 


289 


branched-chain amino acid ABC transporter, 
permease protein 


ORF01769 


SAG1582 


388 


branched-chain amino acid ABC transporter, amino 
acid-binding protein 


ORF01770 


SAG1583 


81 


conserved hypothetical protein 


ORF01772 


SAG1584 


377 


IS1548, transposase 
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nv rsion of ORF R f Nos. with SAG R f Nos. 



ORF Ref No. 


SAGxxxxRefN . 


aa 


Annotation 


ORF01773 


SAG1585 


196 


ATP-dependent Clp protease, proteolytic subunit CIpP 


ORF01774 


SAG1586 


209 


uracil ohosDhorihosvltran^feraQt* i 


ORF01775 


SAG1587 


389 


aminotransferase da^Q 1 

HI 1 III IV U Hi Idl Vl (19 Vf UCIOO | 


ORF01777 


SAG1588 


182 


RNA n*!Pthvltran^ff»raQP TrmH family/ nmnn? 


ORF01778 


SAG1589 


450 


amino acid Dsrmea^P mitaKvA 

Wl nil IV QWIW pwl 1 1 ICQBPj UUUUVC 


ORF01779 


SAG1590 


449 


Dotassium lid take* nrntpin Trie fsunilv 


ORF01780 


SAG1591 


475 


cation i .intake* nrntain Trie family 


ORF01781 


SAG1592 


83 


conserved hvnnfhatinal nrntpin TIRRfln07ft 


ORF01782 


SAG1593 


240 


ribosomal ^liHimit ncoi iHm iriHina etmthscA D 
■ iuuoui i icu iai yc duuuuii pocuuuuiiLlli it? £>yilUIcl5B D 


ORF01783 


SAG1594 


194 


conserved hypothetical protein TIGR00281 


ORF01784 


SAG1595 


235 


Uncharacterized ACR, COG 1354 


ORF01785 


SAG1596 


246 


inteqrase/recombinase. ohaae intearase familv 


ORF01786 


SAG1597 


157 


CBS domain protein 


| ORF01787 


SAG1598 


173 


conserved hypothetical protein 


ORF01788 


SAG1599 


324 


HAM1 protein 


ORF01789 


SAG1600 


264 


glutamate racemase 


ORF01790 


SAG1601 


79 


conserved hypothetical protein 


ORF01791 


SAG 1602 

XJtWJ IUUC 




rfiernorane proiein, puiauve 


ORF01792 


SAG 1603 


173 


udiiauf ipuonai reguiaior, Dioun repressor Tamiiy 


ORF01793 


SAG 1604 




iTiemur ane proiein, puiauve 


ORF01794 


SAG1605 


167 


consen/ed hypothetical protein 1 


V-T\rV/ I # *JsJ 


Onu i DUD 


OVl"7 


RNA methyltransferase, TmiH family 


ORF017QR 


Orto 1 DVJ / 




acylphosphatase 




oMu I DUO 




membrane protein, putative 


ORF017QQ 


omo louy 


^1 


amino acid ABC transporter, permease protein 


urvru iouu 


anU ID I U 


ZOD 


amino acid ABC transporter, substrate-binding protein 


ORF01801 


SAG1611 


486 


AIYiiriacck famih/ nmtoin i 
cii i iiucioc idiiuiy fJiuicili 


ORF01802 


SAG1612 


160 




ORF01803 


SAG1613 


600 


Wiiuiiciiclwtt7il£GU Dun, TCcU TalTlliy UUvJ lODa, putative 


ORF01804 


SAG1614 


167 


acetyltransferase, GNAT family 


ORF01805 


SAG1615 


443 


UDP-N-acetylmuramate— alanine ligase 


ORF01806 


SAG1616 


205 


conserved hypothetical protein 


ORF01807 


SAG1617 


32 


hypothetical protein 


ORF01808 


SAG1618 


1032 


Snf2 family protein 


ORF01810 


SAG1619 


377 


IS1548, transposase 


ORF01811 


SAG1620 


436 


phosphogiycerate dehydrogenase-related protein 


ORF01812 


SAG1621 


300 


primosomal protein Dnal 


ORF01813 


SAG1622 


391 


conserved hypothetical protein 


ORF01814 


j SAG1623 


159 


conserved hypothetical protein TIGR00244 


ORF01815 


SAG1624 


501 


sensor histidine kinase CsrS 


ORF01816 


SAG1625 


229 


DNA-binding response regulator CsrR 


ORF01817 


SAG1626 


177 


conserved hypothetical protein 


ORF01818 


SAG1627 


i 296 


heat shock protein HtpX 


ORF01820 


SAG1628 


184 


lemA protein 


ORF01821 


SAG1629 


237 


glucose-inhibited division protein B 


ORF01822 


SAG1630 


459 


sodium transport family protein 


ORF01823 


SAG1631 


223 


potassium uptake protein, Trk family, putativ 


ORF01824 


SAG1632 


276 


cobalt transport family protein 


ORF01825 


SAG1633 


558 


ABC transporter, ATP-binding protein 


ORF01826 


SAG1634 


212 


conserved hypothetical protein 


ORF01827 


SAG1635 


402 


sodium:dicarboxylate symporter family protein 
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Table 32: Conv rsion f ORF Ref Nos. with SAG R fNos. 



ORFRefN . 


SAGxxxx Ref No. 


aa 


Annotati n 


ORF01828 


SAG1636 


455 j 


brancfted-Chain amino acid transport «w<5tpm II rarrifir 

protein 


ORF01829 


SAG1637 


351 


alcohol dehydrogenase, zinc-containing 


| ORF01830 


SAG1638 


230 


ABC transporter, permease protein 


ORF01831 


SAG1639 


356 


ABC transporter, ATP-binding protein | 


ORF01832 


SAG1640 


458 


peptidase, M20/M25/M40 family 


ORF01833 


SAG1641 


274 


lipoprotein,putative 


ORF01834 


SAG1642 


277 


ABC transporter, substrate-binding protein 


ORF01835 


SAG1643 


229 ] 


glutamine amidotransferase, class 1 


ORF01836 


SAG1644 


37 


hypothetical protein 


ORF01837 


SAG1645 


238 


conserved hypothetical protein TIGR01033 


ORF01838 


SAG1646 


32 


hypothetical protein 


ORF01839 


SAG1647 


328 


dihydroxyacetone kinase family protein 


ORF01840 


SAG1648 


178 


transcriptional regulator, TetR family, putative 


ORF01842 


SAG1649 


.37 


hypothetical protein 


ORF01843 


SAG1650 


329 


dihydroxyacetone kinase family protein 


ORF01844 


SAG1651 


192 


dihydroxyacetone kinase family protein 


ORF01845 


SAG1652 


124 


conserved hypothetical protein 


ORF01846 


SAG1653 


237 


glycerol uptake facilitator protein 


ORF01847 


SAG1654 


134 


conserved hypothetical protein 


ORF01848 


SAG1655 


237 


transcriptional regulator, MerR family 


, ORF01849 


SAG1656 


369 


conserved hvoothetical Drotein 


ORF01850 


SAG1657 


83 


hypothetical protein 


ORF01851 


SAG1658 


244 


conserved hypothetical protein 


ORF01852 


SAG1659 


118 


iojap-related protein 


ORF01853 


SAG1660 


173 


isochorismatase family protein 


ORF01854 


SAG1661 


195 


conserved hypothetical protein TIGR00488 


i ORF01855 


SAG1662 


210 


conserved hypothetical protein TIGR00482 


ORF01856 


SAG1663 


105 


conserved hypothetical protein T1GR00253 ] 


ORF01857 


SAG1664 


372 


GTP-binding protein 


ORF01858 


SAG1665 


177 


hydrolase, haloacid dehalogenase-like family 


ORF01859 


SAG1666 


295 


membrane protein 


ORF01860 


SAG1667 


I 480 


glutamyl-tRNA(Gln) amidotransferase, B subunit 


ORF01861 


SAG1668 


488 


glutamyl-tRNA(Gln) amidotransferase, A subunit 


ORF01862 


SAG1669 


100 


glutamyl-tRNA(Gln) amidotransferase, C subunit 


ORF01863 


SAG1670 


881 


pyruvate phosphate dikinase 


ORF01864 


| SAG1671 


I 276 


conserved hypothetical protein 


ORF01865 


SAG 1672 


170 


CBS domain protein 


ORF01866 


SAG1673 


377 


3-hydroxyacyl-CoA dehydrogenase family protein 


ORF01867 


SAG1674 


182 


isochorismatase family protein 


ORF01869 


| SAG1675 


261 


transcriptional regulator CodY, putative 


ORF01870 


SAG1676 


403 


aminotransferase, class 1 


ORF01871 


SAG1677 


137 


universal stress protein family FRAMESH1FT \ 


ORF01872 


SAG1678 


460 


hydrolase, haloacid dehalogenase-like family 


ORF01873 


SAG1679 


320 


asparaginase family protein 


ORF01874 


SAG1680 


| 292 


shikimate 5-dehydrogenase 


ORF01875 


SAG1681 


| 304 


oxidoreductase, aldo/keto reductase family 


ORF01876 


SAG1682 


671 


ATP-dependent DNA helicase RecG 


ORF01877 


SAG1683 


512 


immunogenic secreted protein, putative 


ORF01878 


SAG 1684 


366 


alanine racemase 


ORF01879 


SAG 1685 


119 


holo-(acyl-carrier-protein) synthase 


ORF01880 


SAG1686 


335 


phospho-2~dehydro-3-deoxyheptonate aldolase 


| ORF01881 


f SAG1687 


I 842 


preprotein translocas , SecA subunit 


ORF01882 


SAG1688 


| 315 


mannose-6-phosphat isomerase, class 1 


ORF01883 


| SAG1689 


293 


fructokinase 
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Table 32: C nversi n of ORF Ref Nos. with SaS< f Nos. 



ORF Ref No. 


SAGxxxx Ref No. 


aa 


Annotation 


ORF01885 


SAG 1690 


639 


PTS system IIABC components 


ORF01886 


SAG1691 


479 


sucrose-6-phosphate hydrolase 


ORF01887 


SAG1692 


320 


sucrose operon repressor ScrR 


ORF01888 


u SAG1693 


144 


N utilization substance protein B 


| ORF01889 


SAG1694 


129 


conserved hypothetical protein 


ORF01890 


SAG1695 


186 


translation elongation factor P 


ORF01892 


SAG1696 


38 


hypothetical protein 


ORF01893 


SAG1697 


48 


hypothetical protein 


ORF01894 


SAG1698 


99 


conserved hypothetical protein 


ORF01895 


SAG1699 


30 


hypothetical protein 


ORF01896 


SAG1700 


76 


hypothetical protein 


ORF01897 


SAG1701 


56 


hypothetical protein 


ORF01898 


SAG1702 


41 


hypothetical protein 


ORF01899 _j 


SAG1703 


54 


hypothetical protein 


ORF01900 


SAG1704 


150 


cytidine/deoxycytidyiate deaminase family protein 


ORF01902 


SAG1705 




peptidase, M24 family POINT MUTATION' 


ORF01903 


SAG1706 


238 


conserved hypothetical protein 


ORF019Q4 


SAG1707 


499 


drug resistance transporter, EmrB/QacA family 


ORF01905 


SAG1708 


38 


hypothetical protein 


ORF01906 


SAG1709 


942 


excinuclease ABC, A subunit 


ORF01907 


SAG1710 


223 


conserved hypothetical protein 


ORF01908 


SAG1711 


314 


magnesium transporter, CorA family 


ORF01909 


SAG1712 


79 


ribosoma! protein S18 


ORF01910 


SAG1713 


163 


single-strand binding protein 


ORF01911 


SAG1714 


95 


ribosomal protein S6 


ORF01912 


SAG1715 


374 


A/G-specific adenine glycosylase 


ORF01913 


SAG1716 


197 


transcriptional regulator, Cro/Ci family 


ORF01914 


SAG1717 


104 


thioredoxin 


ORF01915 


SAG1718 


166 


PAP2 family protein 


ORF01916 


SAG1719 


779 


MutS2 family protein 


ORF01917 


SAG1720 


180 


conserved hypothetical protein 


ORF01918 


| SAG1721 


103 


conserved hypothetical protein 


ORF01919 


SAG1722 


297 


ribonuclease Hill 


ORF01920 


SAG1723 


197 


signal peptidase I 


ORF01921 


SAG1724 


806 


helicase, putative 


ORF01922 


SAG1725 


160 


conserved hypothetical protein 


ORF01923 


SAG1726 


364 


DNA-damage inducible protein P 


ORF01924 


SAG1727 


770 


formate acetyltransferase 


ORF01925 


SAG1728 


124 


FMN-binding protein 


ORF01926 


SAG1729 


309 


conserved hypothetical protein 


ORF01927 


SAG1730 


251 ] 


proteinase, putative, degenerate, FRAMESHIFT 


ORF01928 


SAG1731 


298 


membrane protein, putative 


ORF01929 


SAG1732 


282 


glycerol uptake facilitator protein, putative 


ORF01930 


_ SAG1733 


150 


universal stress protein family 


ORF01931 


SAG1734 


400 


transporter, putative 


ORF01932 


SAG1735 


219 


transcriptional regulator, Crp/Fnr family 


ORF01933 


SAG1736 


761 


X-pro dipeptidyl-peptidase 


ORF01934 


SAG1737 


119 


hypothetical protein 


ORF01936 


SAG1738 


326 


polyprenyl synthetase family protein 


ORF01937 


SAG1739 


582 


ABC transporter, ATP-binding protein CydC 


ORF01938 


SAG1740 


572 


ABC transporter, ATP-binding protein CydD 


ORF01939 _j 


SAG1741 


339 


cytochrome d ubiquinol oxidase, subunit IJ 


ORF01940 


SAG1742 


475 


cytochrome d oxidase, subunit I 


ORF01941 


SAG1743 


402 


pyridine nucleotide-disulphide oxidoreductase family 
protein 
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lonversi nofORFR f Nos. with SAGRef Nos. 



ORFRefN . 


SAGxxxxRef No. 


aa 


Annotation 


ORF01942 


SAG1744 


299 


prenyltransferas , UbiA family 


ORF01943 


SAG1745 


148 


hypothetical protein 


ORF01944 


SAG1746 


35 


hypothetical protein 


ORF01945 


SAG1747 


99 


conserved hypothetical protein TIGR0O1O3 


ORF01946 


SAG 1748 


396 


cyclopropane-fatty-acyl-phospholipid synthase 


| ORF01947 


SAG1749 


241 


transcriptional regulator, merR family 


ORF01948 


SAG1750 


195 


exonuclease 


| ORF01949 


SAG1751 


178 


conserved hypothetical protein 


ORF01950 


SAG1752 


375 


conserved hypothetical protein TIGR00275 


ORF01951 


SAG1753 


260 


conserved hypothetical protein 


ORF01952 


SAG1754 


89 


ribosomal protein S14 


ORF01953 


SAG1755 


38 


hypothetical protein 


ORF01964 


SAG1756 


341 


conserved hypothetical protein 


ORF01957 


SAG1757 


336 


O-siaioglyco protein endopeptidase family protein \ 


ORF01958 


SAG1758 


135 


ribosomal-protein-aianine acetyltransferase, putative 


ORF01960 


SAG1759 


230 


glycoprotease family protein, putative 


| ORF01961 


SAG1760 


76 


conserved hypothetical protein 


! ORF01962 


SAG1761 


559 


metallo-beta-lactamase superfamily protein 


ORF01963 


SAG1762 


169 


conserved hypothetical protein 


! ORF01964 


SAG1763 


448 


gtutamine synthetase, type I 


ORF01965 


SAG1764 


123 


transcriptional regulator GlnR \ 


! ORF01967 


SAG1765 


179 


conserved hypothetical protein 


ORF01969 


SAG1766 


398 


phosphoglycerate kinase 


ORF01970 


SAG1767 


289 


acid phosphatase 


ORF01971 


SAG176B 


336 


gtyceraldehyde 3-phosphate dehydrogenase 


ORF01972 


SAG1769 


692 


translation elongation factor G 


ORF01973 


SAG1770 


156 


ribosomal protein S7 


ORF01974 


SAG1771 


137 


ribosomal protein S12 


ORF01975 


SAG1772 


270 


puroperon repressor 


ORF01976 


SAG1773 


313 


HD domain protein 


ORF01977 


SAG1774 • 


424 


conserved hypothetical protein 


ORF01978 


SAG 1775 


210 


conserved hypothetical protein 


ORF01979 


SAG1776 


220 


ribulose-phosphate 3-epimerase 


ORF01980 


SAG1777 


290 


conserved hypothetical protein TIGR00157 


ORF01981 


- SAG1778 


283 


rRNA (guanine-N1-)-methyltransferase, putative 


ORF01983 


SAG1779 


290 


dimethy (adenosine transferase 


ORF01984 


SAG1780 


I 163 


hypothetical protein 


ORF01985 


SAG1781 


186 


primase-related protein 


ORF01987 


SAG1782 


260 


deoxyribonuclease, TatD family 


ORF01988 


SAG1783 


90 


hypothetical protein 


ORF01989 


SAG1784 


130 


hypothetical protein 


ORF01990 


SAG1785 


430 


hypothetical protein 


ORF01991 


! SAG1786 


130 


hypothetical protein 


ORF01992 


SAG1787 


420 


dltD protein 


ORF01993 


SAG1788 


79 


D-alanyl carrier protein 




SAG 1789 


421 


dltB protein 


ORF01996 


SAG1790 


511 


D-alanine-activating enzyme 


ORF01997 


SAG1791 


395 


sensor histidine kinase 


ORF01998 _j 


SAG1792 


224 


DNA-binding response regulator 


ORF01999 


SAG1793 


44 


ribosomal protein L34 


ORF02000 


SAG1794 


451 


membrane protein, putativ 


ORF02001 


SAG1795 


388 


transposase, IS30 family, putative 


ORF02002 


SAG1796 


575 


amino acid ABC transporter, permease protein 


ORF02004 


SAG1797 


407 


amino acid ABC transporter, ATP-binding protein 
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Table 32: Conv rsion of ORF Ref Nos. with SAG R fN s. 



ORFRefN . 


SAGxxxx Ref N . 


aa 


Annotation 


ORF02005 


SAG1798 


39 


hypothetical protein 


ORF02006 


SAG1799 


792 


xylulose-5-phosphate/fructose-6-DhosDhate 
phosphoketotase 


ORF02007 


SAG1800 


363 


conserved hypothetical protein 


ORF02008 


I SAG1801 


! 559 


transcriptional antiterminator BgIG family 


ORF02009 


! SAG1802 


253 


conserved hypothetical protein 


ORF02010 


SAG1803 


505 


carbohydrate kinase, FGGY family 


ORF02011 


SAG1804 


329 


hypothetical protein 


ORF02012 


SAG1805 


483 


PTS system component putative 


ORF02015 


| SAG1806 


318 


glyoxylate reductase. NADH-deoendent 


ORF02016 


SAG1807 


339 


hypothetical protein 


ORF02017 


SAG1808 


327 


sugar binding: transcriotional reaulator I ari famik/ 


ORF02018 


SAG1809 


215 


transaldolase family protein 


ORF02019 


SAG1810 


i 238 


carbohydrate isomerase, AraD/FucA family 


ORF02020 


SAG1811 


287 


hexulose~6-phosphate isomerase. putative 


ORF02021 


SAG1812 


221 


hexulose-6-phosphate synthase putative 


ORF02022 


SAG1813 


161 


PTS system, IIA component 


ORF02023 


SAG1814 


92 


PTS system, MB component 


ORF02024 


SAG1815 


I 479 


transport protein SgaT, putative 


ORF02025 


I SAG1816 


205 


hypothetical protein 


ORF02026 


SAG1817 


157 


hypothetical protein 


ORF02027 


SAG1818 


430 


adenylosuccinate synthetase 


ORF02028 


SAG1819 


340 


perfringolysin O regulator protein j 


ORF02029 


I SAG1820 


224 


conserved hvDothettcal orotein 


ORF02030 


SAG1821 


750 


alutamate— cvsteine linasA-rplatari nmtoin 


ORF02031 


! SAG1822 


272 


conserved hvoothetical orotein 


ORF02032 


I SAG1823 


418 


conserved hypothetical protein 


ORF02033 


SAG1824 


291 


chaperonin, 33 kDa 


ORF02034 


! SAG1825 


325 


NifR3/Smm1 family protein 


ORF02035 


[ SAG1826 


213 


deoxvnucleoside kinase famtlv nrnt^in 


ORF02036 


SAG1827 


163 


phosphinothricin N-acetvltransf erase 


ORF02037 


SAG1828 


815 


ATP-dependent CId Drotease ATP-hinriinn cuhnnit 


ORF02038 


SAG1829 


154 


transcriptional regulator CtsR 


ORF02039 


SAG1830 


153 


conserved hypothetical protein 


ORF02040 


SAG1831 


346 


translation elongation factor Ts 


ORF02041 


SAG1832 


256 


ribosomal protein S2 


ORF02042 


SAG1833 


186 


alkyl hydroperoxide reductase, subunit C 


ORF02043 


SAG1834 


510 


alkyl hydroperoxide reductase, subunit F 


ORF02044 


SAG1835 


134 


conserved hypothetical protein 


ORF02045 


SAG1836 


61 


conserved hypothetical protein 


ORF02046 


SAG1837 


468 


lysin, putative 


ORF02047 


SAG1838 


109 


holin, putative 


ORF02048 


SAG1839 


136 


conserved hypothetical protein j 


ORF02049 


SAG1840 


112 


hypothetical protein 




SAG1841 


76 


conserved domain protein 


| ORF02051 


SAG1842 


1224 


PbIB, putative 


ORF02053 


SAG1843 


240 


conserved hypothetical protein 


ORF020"6 


SAG1644 


911 


conserved hypothetical protein 


ORF02057 


SAG1845 


42 


hypothetical protein 


I ORF02058 


SAG1846 


158 


hypothetical protein 


ORF02059 


SAG1847 


227 


conserved hypothetical protein 


ORF02060 


SAG1848 


114 


conserved hypothetical protein 


ORF02061 


SAG1849 


115 


hypothetical protein 



36 



Table 32: C nversion fORFR fN s. with SAG Ref Nos. 



ORFR fNo. 


SAGxxxx Ref No. 


aa 


Ann tation 


ORF02062 


SAG1850 


101 


hVDOihptircil nrrttoin 


ORF02063 


SAG1851 


111 


PftflOO fori #4nm«*fn nmtAia 

wuuociveu uomain protein 


ORF02064 


SAG1852 


420 


Miiiocivcu uuiiiain proiem 1 


ORF02066 


SAG 1853 


180 




ORF02067 


SAG1854 


380 


uunacivcu nypoineticai protein 


ORF02068 


SAG1855 


570 


T£5 rfY"T IflQOQ lorna onKiinU nii(/t4Stm 


ORF02069 | 


SAG 1856 


161 


iiypuineticai protein 


ORF02070 


SAG1 858 


95 


nypoineiicai protein 


ORF02071 


SAG1859 


180 


biie-speciric recomomase, phage integrase family 


ORF02072 


SAG1860 


154 


conserved hypothetical protein 


ORF02073 


SAG 1861 


11Q 


uanscnpuonai regulator, cro/ui tamiiy 


ORF02075 


SAG 1862 


86 




ORF02076 


SAG1 863 

w#i%^ i www 


1*3R 


singie-sirana Dinoing protein 


ORF02077 


SAG 1864 

W#^vrf 1 WW^ 




hypothetical protein 


ORF02078 


SAG 1865 


74 




ORF02079 


SAG 1866 


l \jtj 


conservea nypotneticai protein 


ORF02080 


SAG 1867 


too 


conservea nypotneticai protein 


ORF02081 


\JI\\J 1 QUO 


IO*» 


nypotneticai protein 


ORF02082 




HOI 


type 11 UNA modification methyltransferase, putative 


ORF02083 


SAG 1870 


273 


DMA r^nlir»a#inn nrntoin P\nar mila4!tia 


ORF02084 


SAG 1871 


248 


rnnQPn/Pfl h\/nnthotir*o1 nmtoin 


ORF02085 


SAG1872 


200 


hvnnth^tir^al nmtoin 


ORF02086 


SAG 1873 


443 


reDlif*ftti\/fa DMA hpliraco 


ORF02087 


SAG1874 


! 87 


hvnnffKotir^sil nrntoin 
iiypuu icuuai fjiwitviii 


ORF02088 


SAG1875 


94 


conserved hypothetical protein 


ORF02089 




1 rO 


niNn enaonuciease family protein 


ORF02090 


SAG 1877 




anurepressor protein, putative 


ORF02091 


SAG 1878 


109 


conservea aomam protein 


S ORF02092 


SAG 1879 

W/»V^ I U f J 


1 w 


nypoineiicai protein 


ORF02093 


SAG 1880 


w*t 


n Alhatiwa 1 nrntain 


ORF02094 


SAG1881 


w 1 


nypoineiicai proiem 


ORF02095 


SAG 1882 


190 


repressor protein, putative 


ORF02097 


SAG 1884 


134 


nypoineiiwai protein 


ORF02098 


SAG1885 


3*56 


siie-specmc recomoinase, pnage integrase family 


ORF02100 


SAG1886 


32 


hypothetical protein 


ORF02101 


SAG 1887 


689 


Na+/H+ ey chancier familv orotpin 


ORF02102 


SAG1888 


! 78 


hypothetical protein 


| ORF02103 


SAG1889 


317 


microcin immunitv nrotein MccF nnt^tiuA 


ORF02104 


SAG1890 


631 


endooentidase O 


ORF02105 


SAG1891 


327 


oxidoreductase, Gfo/ldh/MocA family 


ORF02107 


SAG1892 


358 


membrane protein, putative 


ORF02108 


SAG1893 


59 


hypothetical protein 


ORF02109 


SAG1894 


214 


Cyclic nucleotide-binding domain protein 


ORF02110 


SAG1895 


204 


polypeptide deformylase 


ORF02111 


SAG1896 


333 


sugar binding transcriptional regulator RegR 


ORF02112 


SAG1897 


634 


conserved hypothetical protein 


ORF02113 


SAG1898 


271 


PTS system, IID component 


ORF02114 ! 


SAG1899 


288 


PTS system, IIC component 


ORF02115 


SAG1900 


164 


PTS system, IIB component 


ORF02116 


SAG1901 


398 


glucuronyl hydrolase 


ORF02118 


SAG 1902 


144 


PTS system, HA component 


ORF02119 . 


SAG1903 


34 


hypothetical protein 
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:onv rsion of ORF R f Nos. with SAG Ref Nos. 



ORFRef No. 


SAGxxxx Ref No. 


aa 


Ann tation 


ORF02120 


SAG1904 


270 


oxidoreductase, short-chain dehydrogenase/reductas 
family 


ORF02121 


| SAG1905 


212 


conserved hypothetical protein 


ORF02122 


SAG1906 


335 
www 


varDunyaraie Kinase, PtKB family 


ORF02123 


SAG1907 


212 


^-uoiiyaro-w-aeoxypnospnogiuconate aldolase/4- 
hydroxy-2-oxoglutarate aldolase | 


ORF02124 


SAG1908 


499 


iiypoineucai proiein 


ORF02125 


SAG 1909 


! 204 


Niirureauccase tamiiy protein 


ORF02126 


SAG1910 


141 


transcriptional regulator, MarR family 


ORF02127 


SAG1911 


i*tuo 


uina polymerase III, alpha subunit, Gram-positive type 


ORF02128 


SAG1912 


194 


a^oiyiiiiuiumuyi-L-aianine amiaase, family 4 protein 


ORF02129 


SAG1913 


617 


prolyl-tRNA synthetase 


ORF02130 


SAG1914 


419 


membrane-associated Tine mptaiinnrntoaco nutafnfo 


ORF02131 


SAG1915 


264 


phosphatidate cytidylyltransferase 


ORF02132 


SAG1916 


250 


undecaprenyl diphosphate synthase 


ORF02133 


SAG1917 


113 


preprotein translocase, YajC subunit 


j ORF02134 


! SAG1918 


114 


conserved hypothetical protein 


ORF02135 


SAG1919 


387 


malate oxidoreductase 


i ORF02136 


SAG1920 


445 


citrate carrier protein, CCS family 


ORF02137 


SAG1921 


508 


sensor histidine kinase family protein 


ORF02138 


SAG1922 


229 


response regulator 


ORF02139 


SAG1923 


331 


UDP-glucose 4-epimerase 


ORF02140 


SAG1924 


535 


glucan 1 ,6-alpha-glucosidase 


ORF02141 


SAG1925 


377 


sugar ABC transporter, ATP-bindlng protein 


ORF02142 


SAG1926 


283 


helix-turn-helix domain protein, fis-type 


ORF02143 


SAG1927 


298 


lacX protein 


ORF02144 


SAG1928 


325 


tagatose 1 ,6-diphosphate aldolase 


ORF02145 


SAG1929 


310 


taqatose-6-DhosDhate kinase 


ORF02146 


SAG1930 


171 


galactose-6-DhosDhate isamera^e 1 sirR Qnhunit 


ORF02147 


SAG1931 


I 141 


a alactose-6- phosphate isomerase LacA <%nhnnit 


ORF02148 


. SAG1932 


816 


neuraminidase 


ORF02149 


j SAG1933 


482 


PTS system, II C component, putative 


ORF02150 


! SAG1934 


101 


PTS system, II B component, putative 


ORF02152 


SAG1935 


157 


PTS system, HA component, putative 


ORF02163 


SAG1936 


258 


lactose phosphotransferase svstem retire <;<%nr 


ORF02156 


SAG1937 




streptococcal histidine triad familv orotein deapn^ratp 
FRAMESHIFT 


ORF02157 


SAG1938 


307 


adhesion lipoprotein, putative 


ORF02158 


SAG1939 


147 


conserved hypothetical protein TIGR00256 


ORF02159 


SAG1940 


738 


GTP pyrophosphokinase 


| ORF02160 


SAG1941 


800 


2 ,3 -cyclic-nucleotide 2' -phosphodiesterase 


ORF02161 


SAG1942 


151 


nrdl protein, putative 


ORF02162 


SAG1943 


345 


conserved hypothetical protein 


ORF02163 


SAG1944 


165 


conserved hypothetical protein 


Unr 


SAG1 945 


345 


iron ABC transporter, iron-binding protein 


ORF02165 


SAG1946 


257 


DNA-binding response regulator 


ORF02166 


SAG1947 


549 


conserved hypothetical protein 


ORF02167 


SAG1948 


275 | 


PTS system, I1D component 1 


ORF02168 


SAG1949 


269 


PTS system, 110 component 


ORF02169 


SAG1950 


163 


PTS system, IIB component 


ORF02170 


SAG1951 


141 


PTS system, IIA component, putative 


ORF02171 


SAG1952 


353 


membrane protein, putative j 


ORF02172 


SAG1953 


60 


hypothetical protein 
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Table 32: Conv rsi n of ORF Ref Nos. with SA^R f Nos. 



ORF Ref No. 


SAGxxxx Ref No. 


aa 


Ann tatlon 


ORF02173 


SAG1954 


384 


hypothetical protein 


ORF02174 


SAG1955 


282 


ABC transporter ATP-bindina orotein 


ORF02175 


SAG1956 


96 


conserved domain protein 


ORF02176 


SAG1957 


250 


response regulator 


i ORF02177 


SAG1958 


276 


conserved hypothetical orotein 


ORF02178 


SAG1959 


727 


PTS SVStem. II ABC comoonenk 


[ ORF02179 


SAG1960 


551 


sensor histidine kina<tp 


ORF02180 


SAG1961 


225 


Dhosohate reoulon rp^non<»A rpfiulatnr PhoR 


ORF02181 


SAG1962 


218 


Dhosohate transnnrt <?v«;t«m rpaufatnrv nrotpin Phnl I 
putative 


ORF02182 


SAG1963 


253 


phosphate ABC transporter, ATP-binding protein 


ORF02183 


SAG1964 


292 


phosphate ABC transporter, permease protein 


ORF02184 


SAG1965 


281 


phosphate ABC transporter, permease protein 


ORF02186 


SAG1966 


293 


hemolysin precursor, putative 


ORF02187 


SAG1967 


195 


hypothetical protein 


ORF02188 


SAG1968 


246 


conserved hypothetical protein TIGR00046 


ORF02189 


SAG1969 


317 


ribosomal protein L1 1 methyltransferase 


ORF02190 


SAG1970 


102 


conserved hypothetical protein 


ORF02191 


SAG1971 


41 


hypothetical protein 


ORF02192 


SAG1972 


238 


transcriptional regulator, MerR family 


ORF02194 


SAG1973 


156 


ace tyltransf erase, GNAT family 


ORF02195 


SAG1974 


152 


MutT/nudix family protein 


ORF02196 


SAG1975 


47 


hypothetical protein 


ORF02197 


SAG1976 


156 


conserved hypothetical Drotein 


ORF02198 


SAG1977 


163 


acetyltransferase, GNAT family 


ORF02199 


SAG1978 


422 


ATPase, AAA family 


ORF02201 


| SAG1979 


253 


hypothetical protein 


I ORF02202 


SAG1980 


300 


ABC transporter, ATP-binding protein 


ORFO2203 


SAG1981 


68 


hypothetical protein 


ORF02205 


SAG19B2 


359 


transcriptional regulator, Cro/CI family 


ORF02206 


SAG1983 


105 


conserved hypothetical protein 


ORF02207 


SAG1984 


188 


conserved hypothetical protein TIGR00730 ] 


ORF02208 


SAG1985 


51 


hvoothefical nrotpin 


ORF02209 


SAG1986 


375 


inteorase ohaae familv outath/e 


ORF02210 


SAG1987 


61 


conserved hvDothetical orotein 


( ORF02211 


SAG1988 


342 


conserved hypothetical protein 


ORF02212 


SAG1989 


139 


hypothetical protein 


ORF02213 


SAG1990 


127 


hypothetical protein 


ORF02214 


SAG1991 


204 


transcriptional regulator, Cro/CI family 


ORF02215 


SAG1992 


518 


conserved hypothetical orotein i 


ORF02216 


SAG1993 


373 


site-SDecific recombinase Dhaae intearase familv 


ORF02217 


SAG1994 


I 108 


conserved hypothetical protein 


ORF02219 


SAG1995 


210 


hypothetical protein 


ORF02221 


SAG1996 


263 


cell wail anchor protein-related protein 


ORF02223 


SAG1997 


182 


hypothetical protein 


ORF02224 


SAG1998 


457 


hypothetical protein 


ORF02225 


SAG1999 


47 


hypothetical protein 


ORF02226 


SAG2000 


666 


membrane protein, putative 


ORF02227 


SAG2001 


756 


conjugal transfer protein, interruptiorvC 


ORF02228 


SAG2002 


129 


IS1381, transposase OrfB 


ORF02229 


SAG2003 


127 


IS1381, transposase OrfA 


ORF02230 


SAG2005 


136 


conserved hypothetical protein 


ORF02231 


> SAG2006 


88 


conserved hypothetical protein 


ORF02232 


SAG2007 


317 


conserved hypothetical protein 
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Table 32: Conversion 



f ORF R f Nos. with SAG Ref Nos. 



ORF Ref No. 


SAGxxxx Ref No. 


aa 


Ann tation 


ORF02233 


| SAG2008 


84 


conserved hypothetical protein 


ORF02234 


SAG2009 


88 


conserved hypothetical protein 


ORF02235 


SAG2010 


157 


hypothetical protein 


ORF02236 


SAG2011 


160 


conserved hypothetical protein 


ORF02237 


SAG2012 


90 


hypothetical protein j 


ORF02238 


SAG2013 


i 189 


hypothetical protein j 


ORF02239 


SAG2014 


449 


hypothetical protein j 


ORF02240 


I SAG2015 


| 99 


transcriptional regulator, Cro/CI family 


ORF02241 


SAG2016 


125 


hypothetical protein 


ORF02242 


SAG2017 


429 


transcriptional reaulator Cro/ci famiiv 


ORF02243 


SAG2018 


553 


FtsK/SpolllE family protein 


ORF02244 


SAG2019 


153 


hypothetical protein 


ORF02245 


SAG2020 


98 


hypothetical protein 


ORF02246 


SAG2021 


826 


cell wall surface anchor famiiv nmtoin 


ORF02247 


SAG2022 


I 417 


transDosase ISL3 famiiv 


ORF02249 


SAG2023 


546 


mercuric reductase 


ORF02250 


SAG2024 


130 


merCUlic r6SlStanCfi nn^mn mm llatrtru nmtaln m a .d 


ORF02251 


SAG2025 


522 


Mn2+/Fe2+ transporter, NRAMP family 


ORF02252 


SAG2026 


240 


membrane protein, putative 


ORF02253 


! SAG2027 


205 


ABC transporter, ATP-bindina nrotein ! 


ORF02254 


SAG2028 


! 36 


conserved hypothetical protein 


ORF02255 


| SAG2029 


284 


streptomycin resistance protein 


ORF02257 


SAG2030 


130 


hypothetical protein 


ORF02258 


SAG2031 


202 


hypothetical protein 


ORF02259 


SAG2032 


111 


conserved hypothetical protein 


ORF02260 


SAG2033 


162 


acetyltransferase, GNAT family 


ORF02261 


SAG2034 


247 


membrane protein, putative 


ORF02262 


! SAG2035 


300 


ABC transporter, A I p-binding protein 


ORF02263 


SAG2036 


68 


hypothetical protein 


ORF02264 


SAG2037 


358 


transcriptional regulator, Cro/CI family 


ORF02265 


SAG2038 


204 


PAP2 family protein 


ORF02266 


SAG2039 


98 


conserved hypothetical protein 


ORF02267 


SAG2040 


186 


conserved hypothetical nrotein TIGRQny^n 


ORF02268 


SAG2041 


287 


protease, putative 


ORF02269 


SAG2042 


100 


rhodanP^P famiiv/ nrntoin 


ORF02270 


SAG2043 


255 


cAMP far-tor 


ORF02271 


SAG2044 


62 


hypothetical protein 


ORF02272 


SAG2045 


1 / 57 


uima topology modulation protein FlaR, putative 


ORF02273 


SAG2046 




glycerol dehydrogenase, putative 


ORF02274 


SAG2047 


&OJ 




ORF02275 


SAG2048 


R14 


o-meinyiieiranyaroToiate— homocysteine 

msthvlfrfinQfpr£i<:p nntatiwa 


| ORF02276 


SAG2049 


745 


u ^ ,,,cu ijiicu at iyuiupiciuyiiriyiuiaiTiaie--nomocysteine 
methyltransferase 


ORF02277 


SAG2050 


107 


conserved hypothetical protein 


ORF02278 


SAG2051 


230 


branched-chain amino acid transport protein AzIC, 
putative 


ORF02279 


SAG2052 


^ 41 


hypothetical protein 


ORF02280 


SAG2053 


1570 


serine protease, subtilase family, putative 


ORF02281 


SAG2054 


228 


DNA-binding response regulator 


ORF02282 


SAG2055 


462 


sensor histidine kinase 


ORF02283 


SAG2056 


202 


chromosome assembly-related protein 


ORF02285 


SAG2057 


833 


leucyl-tRNA synthetase 


ORF02286 


SAG2058 


415 


major facilitator family protein 
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Tabl 



i^O'M-O-^p? 7 O « i£? O ^ 
32: Conversi n fORFR f Nos. with SAG Ref N s. 



ORF Ref No. 


SAGxxxx Ref N . 


33 


Annntatinn 


ORF02287 


SAG2059 


281 


conserved hypothetical protein 


ORF02288 


SAG2060 






ORF02289 


i SAG2061 




giycosyi transferase, family 8 


ORF02290 


SAG2Q62 


1(9 


transcription antitermination protein NusG 


ORF02291 


SAG20.B3 


DOU 


pathogenicity protein, putative 


ORF02292 


SAG20B4 


*57 


preproteln translocase, SecE subunit, putative 


ORF02293 


SAG20fifi 


77*4 


penicuiirvDinaing protein 2A 


ORF02294 


1 SAG2067 




ribosomal large subunit pseudouridine synthase, RluD 
subfamily 


ORF02295 


SAG2068 


546 


*-/ ■ 1 ,c - uioeaots piuicins ot uriKnown Tuncuon, putative 


ORF02296 


SAG2069 


403 


phosphopentomutase ! 


ORF02297 


SAG2070 


223 


deoxvribose-DhosDhate aldolase 


ORF02298 


SAG2071 


400 


Na+ deoendent nucleoside tran<;nnrtpr 


ORF02300 


SAG2072 


259 


uridine Dhosohorvlase 


ORF02301 


SAG2073 


245 


transcriptional regulator, GntR family 


ORF02302 


SAG2074 


540 


fin pkanomnin ' 


ORF02303 


SAG2075 


94 


rhanornnin "1 f\ H">o 

uiicipc^ronin, iu KU3 


ORF02305 


SAG2078 


267 


Aftf"* tronennr4ar A 1 I'D Kifi<4inM 1 


ORF02306 


SAG2077 


298 




ORF02307 


SAG2078 


320 


lipoprotein, putative 


ORF02308 


SAR207Q 


zoo 


hydrolase, haloacid dehalogenase-like family 


ORF02309 


SAOPOftH 


ZOO 


glyoxalase family protein 


ORF02310 


onu4UO I 


Z4o 


conserved hypothetical protein 


ORF0231 1 




OAK 


anaerobic ribonucieoside-triphosphate reductase 


ORF02312 


SAG2083 




OCDtt/ltranfrfar4«>i!i PMAT Iaih!Iii 


ORF02313 


SAG2084 


310 


virtil^nr*a fartnr fUiviihA nnto(!iiA 


ORF02314 


I SAG2085 


47 




ORF02315 


SAG2086 


723 


anaerobic ribonucieoside-triphosphate reductase 


ORF02316 


SAG2087 


495 


r!T>riQ^rVOH Hl/nrtthoti/"»^al nmtain 1 


ORF02317 


SAG2088 


40 


hx/nntho4ir*^al nmtain 


ORF02318 


SAG2089 


105 




ORF02319 


SAG2090 


136 


rAnc:pn;pH h\/nothafir«ol nmtam Tlf^DPrncn 


ORF02320 


SAG2091 


88 




ORF02321 


SAG2092 


132 


rJin^PfVPrl h\/n/"*thotif*ral nrnfoin 


ORF02322 


SAG2093 


379 


recA nrntpin 


ORF02323 


SAG2094 




v^ijipcieiii^c^uaiiiciyi^iiiQuciDie protein i#in/\ 
FRAMESHIFT 


ORF02325 


SAG2095 


183 


DNA-3-methyl adenine glycosylase 1 


ORF02327 


SAG2096 


196 


Hoi lid ay junction DNA helicase RuvA 


j ORF02328 


SAG2097 


418 


transporter, putative 


ORF02329 


SAG2098 


659 


DNA mismatch repair protein HexB 


ORF02330 


SAG2099 


33 


hypothetical protein 


ORF02331 


SAG2100 


67 


cold shock protein, CSD family 


ORF02332 


SAG2101 


858 


DNA mismatch repair protein HexA — ~j 


ORF02333 


SAG2102 


145 


arginine repressor ArgR, putative 


ORF02334 


SAG2103 


563 


arginyMRNA synthetase 


ORF02335 


SAG2104 


102 


conserved hypothetical protein 


ORF02337 


SAG2105 


290 


conserved hypothetical protein ] 


ORF02338 


SAG2106 


314 


conserved hypothetical protein 


ORF02339 


SAG2107 


583 


aspartyl-tRNA synthetase j 


ORF02340 


SAG2108 


426 


histidyl-tRNA synthetase 


ORF02341 


SAG2109 


60 


ribosomal protein L32 
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Table 32: C nversion fORFR fN §. with SAG R fNos. 



ORF Ref No 


S AGxxxx Ref N 


a a 
dd 


Ann fa4tnn 

Mnn union 


ORF02342 


SAG2110 


to 


iiuudUinai p ruie in loo 


ORF02343 


SAG21 1 1 


1 # o 


vonservca nypouieuc^u proiein 


ORF02344 

^1 WfaU 1 I 


SAG2112 


4Q4 


oiic^apecnic reuuiriuiriase, pnage iniegrasB tamiiy 


ORF02345 


SAG2113 


82 


conserved hvoothetical d ratlin > 


ORF02346 


SAG2114 


342 


conserved hvoothetical orotein 


| ORF02347 


SAG2115 


143 


hvoothetical nrotpin 


ORF02349 


SAG2116 


151 


conserved hvoothetical orotein 


ORF02350 


SAG2117 


71 


hypothetical orotein 


ORF02351 


SAG2118 


306 


transcriDtional reaulator Cro/m famihs 

i*jw« ipii wi iu I I v/UUIulvi | V/l v/ vi Id I 1 Illy 


ORF02352 


SAG2119 


373 


conserved domain orotein 


ORF02355 


SAG2120 


56 


hvDOthetical orotein 


ORF02356 


SAG2121 


176 


hypothetical protein 


ORF02357 


unw^ 1 




DNA-binding response regulator 


ORF02358 I 




ARA 


sensor nisuaine Kinase 


ORF02359 




O I / 


membrane protein, putative 


ORF02360 


55AR212*? 

O/AVJ^ 1 <CvJ 


ouo 


carbamate kinase 


ORF02361 






ornithine carbamoyltransferase 


ORF023B2 


onvj4 i z» f 


4^*1 


sensor histidine kinase 


ORF02363 






response regulator 


orfo5^SZ 

vlAI UtOUt 






amino acid ABC transporter, ATP-binding protein 


ORF02365 


On\J^ 1 OU 




amino acid ABC transporter, permease and amino acid 

hinHinn nmloin 
uiiiuniy piuiciii 


ORF02367 


SAG2131 


847 


IIIC7I lll^ioilO pIvrLCIIl, piiIallV6 


ORF02368 


SAG2132 


247 


kaji loci vcu iiypuuicuccii pioiein 


j ORF02369 


SAG2133 


118 


wi loci vcu i lypuu iciiuai pjuicin 


ORF02370 


SAG2134 


772 


mpmhranp nrntoin nutattw^ 
1 1 ici i iui c2i ic piULCIIl, |JUldUVo 


ORF02371 


SAG2135 


179 


transcriptional regulator, TetR family, putative 


ORF02372 


0/-\V3£. 1 OD 


OA 


conserved hypothetical protein 


ORF02373 


SAG2137 


203 


ribosomal protein S4 


ur\rUtO f *t 




nc 
95 


conserved hypothetical protein 


ORF02375 


SAG2139 


451 


replicat'rve DNA helicase 


| ORF02376 


SAG2140 


150 


ribosomal protein L9 


UKrUZo77 


SAG2141 


660 


DHH family protein 


ADCA0070 


SAG2142 


613 


glucose Inhibited division protein A 




SAG2143 


203 


conserved hypothetical protein TIGR00427 


URr02380 


SAG2144 


373 


tRNA (&.methylaminomethyl-2-thiouridylate)- 
methyltransferase 


UKrU^ool 


SAG2145 


222 


L-serine dehydratase, iron-sulfur-dependent, beta 
subunit 




SAG2146 


290 


L-serine dehydratase, iron-sulfur-dependent, alpha 
subunit 








conserved hypothetical protein ' 


ORF02^R4 


On04 l*rO 


i r y 


LysM domain protein 


ORF02385 






cobalt transport family protein 






zioU 


ABC transporter, ATP-binding protein 


ORF02387 


SAG2151 


279 


ABC tranSDorter ATP-bindina orotein FRAMF^HIFT 


ORF02388 


SAG2152 


180 


CDP-diacylgIyceroI~glycerol-3-phosphate3- 
phosphatidyltransferase 


ORF02389 


SAG2153 


427 


peptidase, M16 family 


ORF02390 


SAG2154 


414 


conserved hypothetical protein 


ORF02391 


SAG2155 


117 


conserved hypothetical protein 


ORF02392 


SAG2156 


369 


recF protein 


ORF02393 


SAG2157 


278 


transporter, putative 


ORF02395 


SAG2158 


220 


transcriptional regulator, Cro/CI family 
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Tabl 32: Conv rsi n f ORF Ref Nos. with SAG Ref Nos. 



ORFR f No. 


SAGxxxx Ref N . 


aa 


Annotation 


ORF02396 


SAG2159 


493 


inosine-5'-monophosphate dehydrogenase 


ORF02397 


SAG2160 


161 


transcriptional regulator. ArgR family 


ORF02398 


SAG2161 


226 


transcriptional regulator, Crp/Fnr family f 


| ORF02399 


SAG2162 


234 


conserved hypothetical protein 


ORF02400 


SAG2163 


410 


arginine delminase 


1 ORF02401 


SAG2164 


136 


acetyl transferase, GNAT family 


I ORF02402 


SAG2165 


337 


ornithine carbamoy (transferase 


| ORF02403 


SAG2166 


475 


arginine/ornithine antiporter 


ORF02404 


SAG2167 


318 


carbamate kinase 


ORF02405 


SAG2168 


341 


tryptophanyMRNA synthetase 


ORF02406 


SAG2169 


230 


conserved hypothetical protein 


ORF02407 


SAG2170 


290 


conserved hvnnthptJral rimtpin 


ORF02408 


SAG2171 


539 


ARf^ tranQnnrtor ATD Hinrlinn nrntoin ***** 
riuw u ai lei , ni r -uiiiuii ly pi vJltJin 


1 ORF02409 


SAG2172 


859 


i^ww ii aj io|jui tci , yjtzi l ilea ob piuidll, puialIVS 


ORF02410 


SAG2173 


159 


vvCJiidcivcu nypumeiicai protein I lorvUU^lo 


ORF02411 


SAG2174 


409 


aci Ills {JlUlBdoc? 


ORF02412 


SAG2175 


257 


|/ai iiiiuiiiiiy protein, raiD larniiy 


ORF02413 


SAG0001 


453 


k*i iiuiiiuouiiiai replication initiator protein una/\ 


ORF02415 


SAG0002 


378 


nNA nnlumoraea III hafo iknnit 


ORF02416 


SAG0003 


293 


uiacyigiyceroi Kinase catalytic aomam protein , putative 


ORF02417 


SAG0004 


65 


conserved hvoothetical orotein 


ORF02418 


SAG0005 


67 


hVDOthetica) Drotein 


ORF02419 


SAG0006 


371 


conserved hypothetical GTP-binding protein 


ORF02420 


SAG0007 


191 


npntirivl-tRNIA hu/irrtlocA 
jj<5|juLiyi-irs.iM/A iiyuiuiabc? 


ORF02421 


SAG0008 


1165 


tranQfrrintinn- ronair rr\t tnlinn fur+tnr 


ORF02422 


SAG0009 


31 


iiypuuicuccu protein 


ORF02423 


SAG0010 


90 


Kil rinmain nrnloin 
w"t uuiiidiii piuteiil 


ORF02424 


SAG0011 


123 


fipll Hix/icinn nrotoin Dnill^ nitfofma 
uiVIOIUII |Ji\Jieiil L/IVfw, putative 


ORF02425 


SAG0012 


44 


wi ioci vcu i ly fjuu itruitdi proicin 


| ORF02426 


SAG0013 


428 


pon^prv/pri hx/rv"ith^tif*nl nrnfain 
wiipcivcu i iy pviti iciii^cli |jfumiil 


ORF02427 


SAG0014 


424 


Mpe 1/VrrfRO fstmih/ nrntptn 


ORF02428 


SAG0015 


180 


hVDOxanthinp-auanlrift nhn^nhnrihriQvltran^fornco ! 

i iu mi icyuai in lo |j| lUo^l iui lUUoyiUdiloiei doc 


ORF02429 


SAG0016 


658 


cell division protein FtsH 


ORF03000 


SAG0157 




Dnase-related protein, DEGENERATE 


ORF03001 


SAG0579 


142 


conserved hypothetical protein 


ORF03002 


SAG0580 


111 


conserved hypothetical protein, truncation 


ORF03003 


SAG0652 




Tn5252, Orf 28 protein, degenerate 


ORF03004 


SAGG655 


57 


conserved hypothetical protein 


ORF03005 


SAG0662 


101 


cylX protein 


ORF03006 


SAG0917 


| 83 


Tn916, hypothetical protein 


ORF03Q07 


SAG0920 


23 


Tn916, hypothetical protein 


ORF03008 


SAG0922 


61 


Tn916 ( hypothetical protein 


ORF03009 


SAG0924 


I 28 


Tn916, tetM leader peptide 


ORF03010 


SAG0936 


39 


Tn916, hypothetical protein 


ORF03011 


SAG1484 


48 


ribosomal protein L33 


ORF03012 


SAG1857 


119 


HNH endonuclease family protein 


ORF03013 


SAG1883 


128 


conserved hypothetical protein 


ORF03014 


SAG2085 


50 


ribosomal protein L33 


ORF03015 


SAG2004 


67 


conjugal transfer protein, intenuption-N 
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List of GAS ORFs which are shared with G^Knd Spn 



gi|13621326|gb|AAK33146.1| 
gi|1 3621 327|gbIAAK331 47.1| 
gi|13621328|gb|AAK33148.1| 
gi|13621329]gb|AAK33149.1| 
gi| 1 3621 330|gb|AAK331 50. 1 1 
gi|1 3621 331 |gb|AAK331 51 .1 1 
gi| 1 3621 332|gb|AAK331 52. 1 1 
gi| 1 362 1 333|gb| AAK331 53. 1 j 
gi|13621334|gb|AAK33154.1| 
gi| 1 3621 335|gb|AAK331 55.1 1 
gij 1 3621 337|gb| AAK331 56. 1 1 
gi| 1 3621 340|gb|AAK331 58.1 1 
gi| 1 3621 341 |gb| AAK331 59.1 1 
gi| 1 3621 343|gb| AAK33 1 60. 1 1 
gi|1 3621344|gb|AAK33161 .1| 
gi| 1 362 1 346|gb| AAK33 1 63.1 1 
gi|13621347|gb|AAK33164.1| 
gi| 1 3621 348|gb|AAK33165.1 1 
gi|13621349|gb|AAK33166.1| 
gi|13621350|gb|AAK33167.1| 
gi| 1 3621353|gb|AAK33169.1 1 
gi|1 3621 354|gb|AAK33 170.1 1 
gi|13621355|gb|AAK33171 .1| 
gi|13621357|gb|AAK33173.1| 
gi|1 3621 358|gb[AAK33 174.1 1 
gi|13621359|gb|AAK33175.1| 
gi|13621361 |gb|AAK33176.1| 
gi|1 3621 362|gb|AAK33177.1| 
gi|13621363|gb|AAK33178.1| 
gi|13621364|gb|AAK33179.1| 
gi| 1 3621 365|gb|AAK331 80.1 1 
gi|13621366|gb|AAK33181 .1| 
gi|13621367|gb|AAK33182.1| 
gi| 1 362 1 368|g b| AAK331 83. 1 j 
gi| 1 3621 369|gb|AAK331 84. 1 1 
gi|1 3621 370|gb|AAK331 85.1 1 
gi|13621372|gb|AAK33186.1| 
gi|1 3621 373|gb| AAK331 87.1 1 
gi|13621374|gb|AAK33188.1| 
gi|1 3621 375|gb| AAK331 89. 1 1 
gi|13621376|gb|AAK33190.1| 
gi|13621377|gb|AAK33191 .1| 
gi| 1 3621 378|gb|AAK331 92. 1 1 
g i| 1 3621 379|gb| AAK331 93.1 1 
gi|1 3621 380|gb|AAK331 94.1 1 
gi|1 3621 382|gb|AAK331 96.1 1 
gi|1 3621 383|gb|AAK331 97. 1 1 
gi|1 3621 384|gb|AAK331 98.1 1 
gi|1 3621 385|gb|AAK331 99.1 1 
gi|1 362 1 3 86|gb| AAK33200. 1 1 
gi|1 3621 387|gb|AAK33201 . 1 1 
gi|1 3621 388|gb| AAK33202. 1 1 
gi| 1 3621 389|gb| AAK33203. 1 1 
gi| 1 3621 390f gb| AAK33204. 1 1 
gi|1 3621391 |gb|AAK33205.1 1 
gi|1 3621 392|gb| AAK33206. 1 1 
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Table 33: List of GAS ORFs which are shared with Gl^incJ Spn 



gi|13621393|gb|AAK33207.1 
gi|13621394|gb|AAK33208.1 
gi|13621397|gb|AAK33210.1 
gi|13621398|gb|AAK33211.1 
gi|13621399|gb|AAK33212.1 
gi|13621401|gb|AAK33214.1 
gi|13621403|gb|AAK33215.1 
gi|13621404|gb|AAK33216.1 
gi|13621405|gb|AAK33217.1 
gi|1 3621407|gb|AAK3321 8. 1 
gi|13621408|gb|AAK33219.1 
gi|13621409|gb|AAK33220.1 
gi|1362141 3|gb|AAK33224.1 
gi| 1362141 5|gblAAK33226. 1 
gi|13621416|gb|AAK33227.1 
gi|13621418|gb|AAK33229.1| 
gi|13621419|gb|AAK33230.1| 
gi|13621424|gb|AAK33234.1| 
gi|13621425|gb|AAK33235.1 
gi|13621426|gb|AAK33236.1| 
gi|13621434|gb|AAK33243.1| 
gi|13621450|gb|AAK33258.1 
gi|13621455|gbIAAK33262.1 
gi|13621456|gb|AAK33263.1 1 
gi|13621457|gb|AAK33264. 1 1 
gi|1 3621467|gb|AAK33273. 1 1 
gi|1 3621 468|gb|AAK33274.1 ! 
gi|13621469lgb|AAK33275.1| 
gi| 1 3621470|gb|AAK33276. 1 1 
gi|13621471|gb|AAK33277.1| 
gi|13621472|gb|AAK33278.1| 
gi|13621473|gb|AAK33279.1| 
gi|13621476|gb|AAK33281.1| 
gi| 1 362 1477|gb|AAK33282. 1 1 
gi|1 3621 478|gb| AAK33283. 1 1 
gi| 1 362 1 480|gb| AAK33285. 1 1 
gi| 1 3621481 |gb|AAK33286.1 1 
gi|13621491|gb|AAK33295.1| 
gi|13621494|gb|AAK33298.1| 
gi|13621496|gb|AAK33299.1| 
gi| 1 3621 501 |gb|AAK33304.1 1 
gi|13621502|gb|AAK33305.1| 
gi| 1 3621 505|gb|AAK33307. 1 1 
gi|13621506|gb|AAK33308.1| 
gi|13621 507|gb|AAK33309.1 1 
gi|13621510|gb|AAK33312.1| 
gi|1362151 1|gb|AAK33313.1| 
gi| 1 3621 51 3|gb| AAK3331 5. 1 1 
gi|13621516|gb|AAK33317.1| 
gi| 1 3621 51 8|gb|AAK33319.1 1 
gi| 1 3621 521 |gb| AAK33322. 1 1 
gi| 13621 522|gb|AAK33323. 1 1 
gi| 1 3621 523|gb|AAK33324. 1 1 
gi| 1 3621 524|gb|AAK33325. 1 1 
gi| 1 3621 525|gb| AAK33326. 1 1 
gi|13621527|gb|AAK33327.1| 



2 



List of GAS ORFs which are shared with Glffitnd Spn 



gi| 1 362 1 528 |gb| AAK33328. 1 1 
gi| 1 362 1 529|gb| AAK33329. 1 1 
gi|13621530|gb|AAK33330.1 1 
gi|13621531|gb|AAK33331 .1 1 
gi|1 3621 532|gb|AAK33332.1 1 
gi|13621533|gb|AAK33333.1| 
gi| 1 3621 534|gb| AAK33334. 1 1 
gi|13621535|gb{AAK33335.1 1 
gi|1 3621 536|gb|AAK33336. 1 1 
gi| 1 362 1 537|gb| AAK33337. 1 1 
gi|1 3621 539| gb| AAK33338. 1 1 
gi|1 3621 540|gb|AAK33339.1 1 
gi|1 3621 541 |gb| AAK33340. 1 1 
gi|1 3621 542|gb|AAK33341 . 1 1 
gi|13621543|gb|AAK33342.1| 
Oil 1 3621 544|gb|AAK33343.1 1 
gi| 1 3621 546|gb|AAK33345.1 1 
gi|13621547|gb|AAK33346.1| 
gi|13621548|gb|AAK33347.1 1 
gi|13621550|gb|AAK33348.1| 
gi| 1 3621 551 |gb|AAK33349. 1 1 
gi|1 3621 552|gb|AAK33350.1 1 
gi| 1 3621 553|gb|AAK33351 .1 1 
gi| 1 3621 554| gb| AAK33352. 1 1 
gi|13621555|gb|AAK33353.1| 
gi|13621557|gb|AAK33355.1| 
gi|13621559|gb|AAK33356.1 1 
gi|13621560|gb|AAK33357.1 1 
gi| 1 3621 56 1 |gb|AAK33358. 1 1 
gi| 1 3621 562|gb| AAK33359. 1 1 
gi|1 3621 563|gb|AAK33360.1 1 
gi|13621564|gb|AAK33361.1| 
gi 1 1 3621 565|gb|AAK33362. 1 1 
gi|13621566|gb|AAK33363.1| 
gi|13621567|gb|AAK33364.1| 
gi|13621569|gb|AAK33365.1 1 
gi|1 3621571 |gb|AAK33367.1 1 
gi|1 3621 572|gb|AAK33368.1 1 
gi|13621573|gb|AAK33369.1 1 
gi|13621574|gb|AAK33370.1 1 
gi|13621575|gb|AAK33371.1| 
gi|1 3621 576|gb|AAK33372.1 1 
gi|1 3621 577|gb|AAK33373.1 1 
gi|13621579|gb|AAK33374.1 1 
gi|13621581|gb|AAK33376.1 1 
gi|13621582|gb|AAK33377.1| 
gi|13621583|gb|AAK33378.1| 
gi| 1 3621 584|gb|AAK33379. 1 1 
gi| 1 362 1 585|gb| AAK33380. 1 1 
gi| 1 3621 586|gb|AAK3338 1 . 1 1 
gi| 1 3621 588|gb|AAK33383. 1 1 
gi| 1 362 1 589|gb|AAK33384. 1 1 
gi|1 3621 590|gb|AAK33385.1 1 
gi|1 362 1 592|gb| AAK33386. 1 1 
gi|13621593|gb|AAK33387.1| 
gi|13621594|gbIAAK33388.1| 
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Table 33TList of GAS ORFs which are 



shared with G^Snd Spn 



gij 1 362 1 595| g b| AAK33389. 1 1 
gi| 1 3621 596|g b| AAK33390. 1 1 
gi| 1 362 1 597|gb| AAK33391 . 1 1 
gi| 1 3621 598|gb| AAK33392. 1 1 
gi|1 3621 599|gb|AAK33393. 1 1 
gi|1 3621 600|gb|AAK33394. 1 1 
gi|1 3621 602|gb|AAK33395.1 1 
gi| 1 362 1603|gb j AAK33396. 1 1 
gi|1 3621604|gb|AAK33397.1 1 
gi|13621605|gb|AAK33398.1| 
gi|1 3621 606|gb|AAK33399. 1 1 
gi|1 3621607|gb|AAK33400.1 1 
gi|13621608|gb|AAK33401.1| 
gi|13621609|gb|AAK33402.1 1 
gt| 1 362161 1 |gb| AAK33404. 1 1 
gi| 1 3621 614|gb|AAK33406. 1 1 
gi| 1 362161 5Jgb|AAK33407. 1 1 
gi| 1 3621 616|gb|AAK33408. 1 1 
gi|1 362161 7|gb|AAK33409.1 1 
gi|13621618|gb|AAK33410.1| 
gi|13621619|gb|AAK33411.1| 
gi|13621620|gb|AAK33412.1| 
gi|13621621|gb|AAK33413.1| 
gi| 1 3621 622 |gb| AAK3341 4. 1 1 
gi|13621623|gb|AAK33415.1| 
gi|13621624|gb|AAK33416.1| 
gi| 1 3621 625|gb|AAK3341 7.1 1 
gi| 1 3621627|gb|AAK3341 9.1 1 
gi|13621629|gb|AAK33420.1| 
gi|13621630|gb|AAK33421.1| 
gi| 1 3621 631 |gb| AAK33422. 1 1 
gi| 1 3621 633|gb| AAK33424. 1 1 
gi| 1 3621 634|gb| AAK33425. 1 1 
gi|13621636|gb|AAK33427.1| 
gi| 1 3621 637|gb|AAK33428. 1 1 
gi|1 3621638|gb|AAK33429.1 1 
gi|13621640|gb|AAK33430.1| 
gl|13621642|gb|AAK33432.1 1 
gi|13621644|gb|AAK33434.1 1 
gi|13621645|gb|AAK33435.1 ! 
gi|13621647|gb|AAK33437.1 
gi|1 3621 648|gb|AAK33438.1 
gi|13621650|gblAAK33440.1 
gi|13621651|gb|AAK33441.1 
gi|13621652|gb|AAK33442.1 
gi|13621657|gb|AAK33446.1 
gi|13621658|gb|AAK33447.1 
gi|13621660|gb|AAK33449.1 
gi| 1 3621670|gb|AAK33458. 1 
gi| 1 3621671 |gb|AAK33459. 1 
gi|13621672|gb|AAK33460.1 
gi| 1 362 1 675|gblAAK33462. 1 
gi|13621676|gb|AAK33463.1 
gi|13621678|gb|AAK33465. 1 
gi|13621680|gb|AAK33467.1 
gi|1 3621681 |gb|AAK33468.1 
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Table 3ir List of GAS ORFs which are 



shared with G^Sand Spn 



gi| 1 3621 682|gb| AAK33469. 1 1 
gi| 1 3621 683|gb| AAK33470. 1 1 
gi|1 3621 684|gb|AAK33471 . 1 1 
gi|13621685|gb|AAK33472.1| 
gi| 1 3621688|gb|AAK33474.1 1 
gi| 1 3621 689|gb|AAK33475.1 1 
gi|13621690|gb|AAK33476.1| 
gi| 1 3621691 |gb|AAK33477. 1 1 
gi|13621692|gb|AAK33478.1| 
gi| 1 3621 693|gb| AAK33479. 1 1 
gi|1 3621694|gb|AAK33480.1 1 
gi| 1 3621 695|gb| AAK3348 1 . 1 1 
gi| 1 3621 697|gb|AAK33483. 1 1 
gi| 1 3621 698|gb| AAK33484.1 1 
gi|13621700|gbIAAK33485.1| 
gi|13621701|gb|AAK33486.1| 
gi|13621702|gb|AAK33487.1| 
gi|13621714|gb|AAK33498.1| 
gi| 1 3621 71 5|gb|AAK33499.1 1 
gi| 1 362171 7|gb|AAK33501 .1 1 
gi|1362171 8|gb|AAK33502.1 1 
gi| 1 362171 9|gb|AAK33503. 1 1 
gi|13621720|gb|AAK33504.1| 
gi| 1 362 1 726|gb| AAK33509.1 1 
gi|13621727|gb|AAK33510.1| 
gi| 1 3621 729|gb| AAK3351 2. 1 1 
gi|1 3621730|gb|AAK33513.1 1 
gi) 1 3621 731 |gb|AAK33514.1 1 
gi|1 3621732|gb|AAK3351 5.1 1 
gi|13621733|gb|AAK33516.1| 
gi|13621734|gb|AAK33517.1| 
gi|1 3621735|gb|AAK3351 8.1 1 
gi|13621736|gb|AAK33519.1| 
gi|13621741|gb|AAK33523.1| 
gi|13621742|gb|AAK33524.1| 
gi|1 3621743|gb|AAK33525.1 1 
gi|13621744|gb|AAK33526.t 
gi|13621745|gb|AAK33527.1 
gi|13621747Igb|AAK33528.1 
gi| 1 3621756|gb|AAK33537.1 
gi|13621773|gb|AAK33552.1 
gi|13621774|gb|AAK33553.1 
gi|13621775lgb|AAK33554.1 
gi|13621777|gb|AAK33556.1 
gi|13621778|gb|AAK33557.1 
gi|13621779|gb|AAK33558.1 
gi|1 3621781 |gb|AAK33559.1 
gi|13621782|gb|AAK33560.1 
gi|13621785|gb|AAK33563.1 
gi| 1 3621 786|gb| AAK33564. 1 
gi|1 3621787|gb|AAK33565. 1 
gi|13621788|gb|AAK33566.1 
gi|13621789|gb|AAK33567.1 
gi| 1 3621 790|gb| AAK33568. 1 
gl|1 3621793|gb|AAK33571 .1 
gi|1 3621794|gb|AAK33572.1 
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Table 3S^List 



f GAS ORFs which are shared with G^Pand Spn 



gi| 1 3621796|gb|AAK33573. 1 1 
gi|13621797|gb|AAK33574.1| 
gi|13621799|gb|AAK33576.1| 
gi| 1 3621 800|gb| AAK33577. 1 1 
gi|13621802|gb|AAK33579.1| 
gij 1 362 1 806 |gb| AAK33583. 1 1 
gi| 1 3621 808|gb|AAK33584. 1 1 
gi| 1 3621 809|gb| AAK33585. 1 1 
gi|1 362181 0|gb|AAK33586. 1 1 
gi| 13621 81 1|gb|AAK33587.1| 
gi|13621812|gb|AAK33588.1| 
gill 3621 81 3|gb|AAK33589. 1 1 
gi|13621814|gb|AAK33590.1| 
gi|13621817|gb|AAK33592.1| 
gi| 1 362181 8|gb| AAK33593. 1 1 
gi| 1 3621 81 9 |gb| AAK33594. 1 1 
gill 362 1 820|gb| AAK33595. 1 1 
gi| 1 3621 821 |gb|AAK33596.1 1 
gi| 1 3621 822|gb| AAK33597. 1 1 
gi|1 3621 823|gb|AAK33598. 1 1 
gi|1 3621 824|gb|AAK33599.1 1 
gi|1 3621 825|gb|AAK33600.1 1 
gi|1 3621 826|gb| AAK33601 . 1 1 
gi|1 3621 828|gb|AAK33602.1 1 
gi|13621829|gb|AAK33603.1| 
gi|13621830|gb|AAK33604.1 1 
gi|13621831 |gb|AAK33605.1 1 
gi|13621834|gb|AAK33608.1 1 
gl| 1 362 1 835|gb| AAK33609.1 1 
gi| 1 3621 836|gb| AAK3361 0. 1 1 
gi| 1 3621 837|gb|AAK3361 1 . 1 1 
gi| 1 362 1 839| gb| AAK3361 2 . 1 1 
gi| 1 3621 840|gb| AAK3361 3. 1 1 
gi|13621841|gb|AAK33614.1| 
gi| 13621 842|gb|AAK3361S. 1 1 
gi| 1 3621 843|gb| AAK3361 6. 1 1 
gi|13621844|gb|AAK33617.1| 
gi|13621898|gb|AAK33667.1| 
gi|13621901|gb|AAK33670.1| 
gi|13621902|gb|AAK33671.1| 
gi|13621904|gb|AAK33672.1| 
gi|1 3621907|gb|AAK33675.1 1 
gi|13621908jgb|AAK33676.1 1 
gi|1 3621909|gb|AAK33677.1 1 
gi|1 362191 0|gb|AAK33678.1 1 
gi|13621912|gb|AAK33680.1| 
gill 3621 924|gb|AAK33690. 1 1 
gi| 1 3621 929|gb|AAK33694. 1 1 
gi| 1 362 1 930|gb| AAK33695. 1 1 
gi|13621931|gb|AAK33696.1| 
gi| 1 3621 933|gb| AAK33698. 1 1 
gi| 1 3621 934|gb| AAK33699. 1 1 
gi|13621935|gb|AAK33700.1 1 
gi|13621936|gb|AAK33701.1| 
gi| 1 362ig37|gb|AAK33702.1 1 
gi|1 3621 938|gb|AAK33703. 1 1 
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tf9 O H-G^^Js v . oasts o e 
List f GAS ORFs which are shared with G^^nd Spn 

gi|13621939|gb|AAK33704.1 
gi|13621942|gb|AAK33706.1 
gi|13621944|gb|AAK33708.1 
gi|1 3621 945|gb|AAK33709.1 
gi|13621946|gb|AAK33710.1 
gi|13621950|gb|AAK33714.1 
gi|13621953|gb|AAK33716.1 
gi|13621954|gb|AAK33717.1 
gi|1 3621 955|gb|AAK3371 8.1 
gi|1 3621 956|gb|AAK33719.1 
gi|13621957|gb|AAK33720.1 
gi| 1 3621 958|gb| AAK33721 .1 
gi|1 3621 959|gb|AAK33722.1 
gi|1 3621 961 |gb|AAK33723.1 
gi|1 3621 975|gb|AAK33736.1 
gi|1 3621 977|gb|AAK33738.1 
gi|1 3621 978|gb|AAK33739.1 
gi|13621979|gb|AAK33740.1 
gi|1 3621 980|gb|AAK33741 .1 
gi|1 3621 981 |gb|AAK33742.1 
gi|13621982|gb|AAK33743.1 
gi|1 3621 985|gb|AAK33745.1 
gi| 1 3621 986|gb| AAK33746.1 
gi|13621987|gb|AAK33747.1 
gi|1 3621 989|gb|AAK33749.1 
gi|13621 990|gb|AAK33750.1 
gi| 1 3621 992|gb| AAK33752.1 
gi|13621993|gb|AAK33753.1 
gi|1 3621 994|gb|AAK33754.1 
gi|13621996|gb|AAK33755.1 
gi|1 3621 997|gb|AAK33756.1 
gij 1 3621 998|gb|AAK33757.1 
gi|13621 999|gb|AAK33758.1 
gi|13622000|gb|AAK33759.1 
gi|1 3622001 |gb|AAK33760.1 
gi|13622002|gb|AAK33761 .1 
gi 1 1 3622003|gb|AAK33762. 1 
gi| 1 3622004|gb|AAK33763. 1 
gi|1 3622005|gb|AAK33764.1 
gi|13622006|gb|AAK33765.1 
gi| 1 3622008|gb| AAK33766. 1 
gi|13622009|gb|AAK33767.1 
gi|1 3622010|gb|AAK33768.1 
gi|13622012|gb|AAK33770.1 
gi|1 362201 3|gb|AAK33771 . 1 
gi|13622017|gb|AAK33774.1 
gi|13622018|gb|AAK33775.1 
gi| 1 362201 9|gb|AAK33776. 1 
gi|13622020|gb|AAK33777.1 
gi|13622021 |gb|AAK33778.1 
gi|1 3622024|gb|AAK33781 .1 
gi| 1 3622025|gb| AAK33782. 1 
gi|1 3622026|gb|AAK33783.1 
gi|1 3622031 |gb|AAK33787.1 
gi| 1 3622032|gb| AAK33788. 1 
gi|13622033|gb|AAK33789.1 
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List f GAS ORFs which are shared with G^Rind Spn 



gi| 1 3622034|gb| AAK33790. 1 
gi|13622035|gb|AAK33791.1 
gi|13622039|gb|AAK33794.1 
gi|13622041|gb|AAK33796.1 
gi|13622042|gb|AAK33797.1 
gi| 1 3622043|gb|AAK33798. 1 
gi|13622044|gb|AAK33799.1 
gi| 1 3622045| gb| AAK33800. 1 
gi|13622046|gb|AAK33801.1 
gi|1 3622048|gb|AAK33802.1 
gi|13622049|gb|AAK33803.1 
gi|13622050|gb|AAK33804.1 
gi|13622051|gb|AAK33805.1 
gi|13622052|gb|AAK33806.1 
gi|13622054|gb|AAK33808.1 
gi|13622055|gb|AAK33809.1 
gi|13622056|gb|AAK33810.1 
gi|13622058|gb|AAK33812.1 
gi|13622060|gb|AAK33813.1 
gi|13622062|gb|AAK33815.1 
gi|1 3622064|gb)AAK3381 7.1 
gi|1 3622065|gb|AAK3381 8.1 
gi|13622068|gb|AAK33821 .1 
gi|1 3622069|gb|AAK33822.1 
gi|13622070|gb|AAK33823.1 
gi|1 3622071 |gb|AAK33824. 1 
gi|13622073|gb|AAK33825.1 
gi|13622074|gb|AAK33826.1 
gi|1 3622075|gb|AAK33827.1 
gi|13622077|gb|AAK33829.1 
gi|1 3622079|gb|AAK33831 .1 
gi|1 3622083|gb|AAK33834.1 
gi|1 3622085|gb|AAK33836.1 
gi|1 3622086|gb|AAK33837.1 
gi|13622087|gb|AAK33838.1 
gi|13622088|gb|AAK33839.1 
gi|13622089|gb|AAK33840.1 
gi|13622090|gb|AAK33841.1 
gi|13622091|gb|AAK33842.1 
gi|1 3622092|gb|AAK33843.1 
gi|1 3622093|gb|AAK33844.1 
gi|1 3622095|gb|AAK33845.1 
gi|13622096|gb|AAK33846.1 
gi|13622097|gb|AAK33847.1 
gi|1 36221 62|gb|AAK33908. 1 
gi|13622163|gb|AAK33909.1 
gi|13622164|gb|AAK33910.1j 
gi|1 36221 65|gb|AAK3391 1 .1 1 
gi|1 3622166|gb|AAK3391 2.1 
gi|1 36221 69|gb|AAK3391 4.1 
gi|13622170|gb|AAK33915.1| 
gi|13622171|gb|AAK33916.1| 
gi|1 36221 72|gb|AAK3391 7.1 1 
gi|13622174|gb|AAK33919.1| 
gi|1 36221 75|g b| AAK33920. 1 1 
gi|1 36221 76|gb|AAK33921 .1 1 
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Table 33^List of GAS ORFs which are shared with GBSand Spn 



gi|13622177|gb|AAK33922.1 1 
gi|13622179|gb|AAK33923. 1 1 
gi|13622180|gb|AAK33924.1| 
gi|13622181|gb|AAK33925.1| 
gi| 1 36221 82|gb| AAK33926. 1 1 
gi| 1 36221 83|gb|AAK33927.1 1 
gi| 1 36221 84|gb| AAK33928. 1 1 
gi|1 36221 85|gb| AAK33929. 1 1 
gi|136221 86|gb|AAK33930.1 1 
gi|1 36221 89|gb|AAK33932.1 1 
gi|13622190|gb|AAK33933.1| 
gi|13622191|gb|AAK33934.1| 
gi|1 36221 92|gb| AAK33935. 1 1 
gi|13622198|gb|AAK33940.1| 
gi| 1 3622200|gb| AAK33942. 1 1 
gi|1 3622201 |gb| AAK33943. 1 1 
gi| 1 3622204|gb| AAK33946. 1 1 
gi| 1 3622205|gb| AAK33947 . 1 1 
gi| 1 3622207 |gb|AAK33949. 1 1 
gi|13622208|gb|AAK33950.1| 
gi|1362221 1 |gb|AAK33952.1 1 
gi|1 362221 3|gb|AAK33954.1 1 
gi|1 3622214|gb| AAK33955.1 1 
gi| 1 362221 5|gb| AAK33956. 1 1 
gi| 1 362221 6|gb| AAK33957. 1 1 
gi| 1 3622217|gb|AAK33958.1 1 
gi|1 362221 8|gb| AAK33959. 1 1 
gi| 1 36222 1 9|gb[ AAK33960. 1 1 
gi| 1 3622222|gb| AAK33962.1 1 
gi| 1 3622223|gb| AAK33963. 1 1 
gi|13622224lgb|AAK33964.1| 
gi| 1 3622233Igb|AAK33972. 1 1 
gi|13622235|gb|AAK33974.1| 
gi| 1 3622236|gb| AAK33975. 1 1 
gi|1 3622237|gb| AAK33976.1 1 
gi|1 3622239|gb| AAK33978.1 1 
gi|1 3622240|gb|AAK33979.1 1 
gi|13622241 |gb|AAK33980.1| 
gi| 1 3622242|gb| AAK33981 .1 1 
gi| 1 3622243|gb| AAK33982.1 1 
gi] 1 3622244|gb| AAK33983. 1 1 
gi| 1 3622250|gb| AAK33988.1 1 
gi| 1 3622252|gb | AAK33990. 1 1 
gi| 1 3622253|gb|AAK33991 .1 1 
gi| 1 3622255|gb|AAK33993. 1 1 
gi| 1 3622256|gb| AAK33994. 1 1 
gi|1 3622257|gb| AAK33995.1 1 
gi| 1 3622259|gb| AAK33996. 1 1 
gi| 1 3622260|gb| AAK33997. 1 1 
gi|13622261|gb|AAK33998.1| 
gi| 1 3622262|gb| AAK33999. 1 1 
gi| 1 3622263|gb| AAK34000. 1 1 
gi) 1 3622264|gb| AAK34001 . 1 1 
gi| 1 3622265|gb| AAK34002. 1 1 
gi| 1 3622266|gb| AAK34003. 1 1 
gi| 1 3622268|gb|AAK34005.1 1 



9 



List of GAS ORFs which are shared with GBS and Spn 



gi|13622269|gb|AAK34006.1| 
gi|1 3622271 |gb|AAK34007. 1 1 
gi|13622272|gb|AAK34008.1 1 
gi|13622273|gb|AAK34009.1 1 
gi|13622274|gb|AAK34010.1 1 
gi|13622275|gb|AAK3401 1 .1 1 
gi|13622276|gb|AAK34012.1| 
gi|13622277|gb|AAK34013.1 1 
gi|13622278|gb|AAK34014.1| 
gi|1 3622279|gb|AAK3401 5.1 1 
gi|13622281 |gb|AAK34017.1 1 
gi|1 3622282|gb|AAK34018.1 1 
gi|13622283|gb|AAK34019.1( 
gi|13622284|gb|AAK34020.1| 
gi|1 3622285|gb|AAK34021 . 1 1 
gi| 1 3622287|gb[AAK34022. 1 1 
gi| 1 3622288|gb|AAK34023. 1 1 
gi|13622289|gb|AAK34024.1| 
gi|13622290|gb|AAK34025.1| 
gi|13622294|gb|AAK34029.1| 
gi| 1 3622295|gb| AAK34030. 1 1 
gi|136222S6|gb|AAK34031 .1 1 
gi|13622297|gb|AAK34032.1| 
gi|1 3622298|gb| AAK34033. 1 1 
gi| 1 3622299|gb|AAK34034. 1 1 
gi| 1 3622301 |gb| AAK34035. 1 1 
gi|13622306|gb|AAK34040.1| 
gi[13622326|gb|AAK34058.1| 
gi|1 3622328|gb| AAK34060. 1 1 
gi| 1 3622329|gb| AAK34061 . 1 1 
gi|13622330|gb|AAK34062.1 1 
gi|13622332|gb|AAK34064.1| 
gi| 1 3622333| gb] AAK34065. 1 1 
gi|1 3622335|gb|AAK34066. 1 1 
gi|1 3622338|gb| AAK34069. 1 1 
gi|13622339|gblAAK34070.1 1 
gi|13622340|gb|AAK34071 .1 
gi|13622341 |gb|AAK34072.1 1 
gi|13622343|gb|AAK34073.1| 
gi|13622350|gb|AAK34080.1 
gi| 1 3622351 |gb|AAK34081.1 
gi|13622352|gb|AAK34082.1 
gi|13622353|gb|AAK34083.1 
gi|13622355|gb|AAK34084.1 
gi|13622356|gb|AAK34085.1 
gi|1 3622357|gb|AAK34086.1 
gi| 1 3622358|gb|AAK34087. 1 
gi| 1 3622359|gb| AAK34088. 1 
gi|13622360|gb|AAK34089.1 
gi|13622361 |gb|AAK34090.1 
gi|13622362|gb|AAK34091.1 
gi|1 3622363|gb|AAK34092. 1 
gi|13622364|gb|AAK34093.1 
gi|1 3622366|gb|AAK34094. 1 
gi| 1 3622367|gb|AAK34095. 1 
gi|1 3622368|gb|AAK34096. 1 
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gi|1 3622369|gb|AAK34097. 
gi|13622370|gb|AAK34098. 
gi|1 3622371 |gb|AAK34099. 
gi|1 3622372|gb|AAK341 00. 
gi|13622373|gb|AAK34101. 
gi|13622374|gb|AAK34102. 
gi|1 3622375|gb|AAK341 03. 
gi|13622376|gb|AAK34104. 
gi|13622377|gb|AAK34105. 
gi|13622378|gb|AAK34106. 
gi|1 3622380|gb|AAK34107. 
gi|1 3622383|gb|AAK3411 0. 
gi|1 3622384|gb|AAK341 11. 
gi|1 3622387|gb|AAK341 14. 
gi|1 3622389|gb|AAK341 1 6. 
gi|1 3622394|gb|AAK34120. 
gi| 1 3622395|gb|AAK34121 . 
gi|13622396|gb|AAK34122. 
gi|13622398|gb|AAK34124. 
gi|13622399|gb|AAK34125. 
gi|1 3622400|gb|AAK34126. 
gi|1 3622401 |gb|AAK34127. 
gi|13622403|gb|AAK34128. 
gi|1 3622405|gb|AAK34130. 
gi|1 3622406|gb| AAK341 31 . 
gi|1 3622407|gb|AAK34132. 
gi| 1 3622408|gb| AAK341 33. 
gi|1 362241 5|gb| AAK341 39. 
gi|1 3622416|gb|AAK34140. 
gi|13622417|gb|AAK34141. 
gi|1 362241 9|gb|AAK34143. 
gi|13622420|gb|AAK34144. 
gi|13622424|gb|AAK34147. 
gi|13622425|gb|AAK34148. 
gi|1 3622431 |gb|AAK341 53. 
gi| 1 3622432|gb| AAK341 54. 
gi|13622433|gb|AAK34155. 
gi| 1 3622434|gb|AAK341 56. 
gi| 1 3622435|gb| AAK341 57. 
gi|13622436|gb|AAK34158. 
gi|13622437|gb|AAK34159. 
gi|1 3622444|gb|AAK34165. 
gi|13622447|gb|AAK34168. 
gi| 1 3622450|gb|AAK341 70. 
gi|1 3622451 |gb|AAK34171 . 
gi|1 3622455|gb|AAK341 75. 
gi|1 3622457|gb|AAK34177. 
gi|13622458|gb)AAK34178. 
gi|1 3622460|gb|AAK341 79. 
gi|1 3622461 |gb|AAK341 80. 
gi|1 3622462|gb|AAK341 81 . 
gi|13622463|gb|AAK34182. 
gi| 1 3622464|gb|AAK341 83. 
gi|13622465|gb|AAK34184. 
gi|1 3622467|gb|AAK34t86. 
gi|13622468|gb|AAK34187. 
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gi| 1 3622471 |gb| AAK341 89. 1 1 
gi| 1 3622473|gb| AAK341 91 . 1 1 
gi|13622474|gb|AAK341 92.1 1 
gi| 1 3622477|gb|AAK341 95. 1 1 
gi|1 3622478|gb|AAK341 96.1 1 
gi|13622479|gbIAAK341 97.1 1 
gi|13622481|gb|AAK34198.1| 
gi| 1 3622482|gb|AAK341 99.1 1 
gi|13622483|gb|AAK34200.1| 
gi|13622484|gb|AAK34201.1| 
gi| 1 3622485|gb| AAK34202. 1 1 
gi|13622486|gb|AAK34203.1| 
gi|1 3622491 |gb| AAK34207. 1 1 
gi|1 3622492|gb|AAK34208.1 1 
gi|13622493|gb|AAK34209.1| 
gi|1 3622494|gb|AAK3421 0.1 1 
gi| 1 3622495|gb|AAK3421 1 .1 1 
gI|13622496|gb|AAK34212.1| 
gi| 1 3622497|gb| AAK3421 3. 1 1 
gi|13622499|gb|AAK34214.1| 
gij 1 3622500|gb| AAK3421 5. 1 1 
gi| 1 3622501 |gb|AAK34216.1 1 
gi|13622506|gb|AAK34221.1| 
gi|1 3622507|gb|AAK34222.1 1 
gi| 1 3622508|gb| AAK34223.1 1 
gi|13622509|gb|AAK34224.1| 
gi|1362251 1|gb|AAK34225.1| 
gi|1 3622512|gb|AAK34226.1| 
gi| 1 362251 3|gb|AAK34227.1 1 
gi|1 362251 5|gb| AAK34229.1 1 
gi| 1 362251 6|gb| AAK34230.1 1 
gi| 1 3622517|gb| AAK34231 .1 1 
gi| 1 362251 8|g b| AAK34232. 1 1 
gi|1 3622520|gb|AAK34233.1 1 
g!|1 3622521 |gb|AAK34234.1| 
gi|1 3622523|gb|AAK34236.1| 
gi| 1 3622524|gb| AAK34237. 1 1 
gi| 1 3622525|gb| AAK34238.1 1 
gi|1 3622526|gb|AAK34239.1 1 
gi|13622527|gb|AAK34240.1| 
gl|13622579|gb|AAK34289.1| 
gi|1 3622583|gb| AAK34292. 1 1 
gi|13622585|gb|AAK34294.1| 
gi|1 3622587|gb|AAK34296.1| 
gi|13622588|gb|AAK34297.1| 
gi|1 3622590|gb|AAK34299.1 1 
gi|1 3622591 |gb|AAK34300.1 1 
gi|13622593|gb|AAK34301.1| 
gi| 1 3622595|gb| AAK34303. 1 1 
gi| 1 3622596[gb|AAK34304.1 1 
gi|13622597|gb|AAK34305.1| 
gi|13622598|gb|AAK34306.1| 
gi| 1 3622599|gb| AAK34307. 1 1 
gi| 1 3622600|gb| AAK34308. 1 1 
gi|1 3622601 |gb|AAK34309.1 1 
gi| 1 3622603|gb| AAK3431 0. 1 1 
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gi|1 3622604|gb|AAK3431 1 . 1 
gi|1 3622606|gb|AAK3431 3. 1 
gi|13622607|gb|AAK34314.1 
gi|1 3622608|gb|AAK3431 5. 1 
gi|13622609|gb|AAK34316.1 
gi|13622610|gb|AAK34317.1 
gi|1362261 1Jgb|AAK34318.1 
gi| 1 3622612|gb|AAK34319. 1 
gi] 1 362261 5|gb|AAK34321 . 1 
gi|13622616|gb|AAK34322.1 
gi|13622617|gb|AAK34323.1 
gi|13622618|gb|AAK34324.1 
gill 3622621 |gb|AAK34327. 1 
gi|13622622|gb|AAK34328.1 
gi|1 3622623|gb|AAK34329.1 
gi|1 3622624|gb|AAK34330.1 
gi|1 3622625|gb|AAK34331 .1 
gi|1 3622626|gb|AAK34332.1 
gi|1 3622628|gb|AAK34333.1 
gi|1 3622629|gb|AAK34334. 1 
gi|1 3622630|gb|AAK34335.1 
gi|1 3622631 |gb|AAK34336. 1 
gi|13622632|gb|AAK34337.1 
gi| 1 3622634|gb| AAK34339. 1 
gi| 1 3622636|gb| AAK34341 . 1 
gi| 1 3622640|gb| AAK34344. 1 
gi|1 3622641 |gb|AAK34345.1 
gi|13622652|gb|AAK34355.1 
gi|1 3622653|gb|AAK34356.1 
gi| 1 3622654|gb|AAK34357. 1 
gi| 1 3622656|gb| AAK34359. 1 
gi|13622660|gb|AAK34363.1 
gi|1 3622665|gb| AAK34367.1 
gi|13622668|gb|AAK34370.1 
gi|1 3622675|gb|AAK34376.1 
gi|13622676|gb|AAK34377.1 
gi| 1 3622683 |gb| AAK34383. 1 
gi|13622684|gb|AAK34384.1 
gi|1 3622685|gb|AAK34385. 1 
gi| 1 3622688|gb|AAK34387. 1 
gi|1 3622689|gb|AAK34388. 1 
gi|1 3622690|gb|AAK34389. 1 
gi| 1 3622691 |gb|AAK34390. 1 
gi|13622692|gb|AAK34391 . 1 1 
gi|1 3622693|gb|AAK34392. 1 
gi| 1 3622694|gb|AAK34393. 1 
gi|13622695|gb|AAK34394.1 
gi|13622696|gb|AAK34395.1| 
gi|1 3622698|gb|AAK34396. 1 1 
gi|1 3622699|gb|AAK34397. 1 1 
gi|1 3622700|gb|AAK34398. 1 1 
gi|1 3622701 |gb|AAK34399.1 1 
gi|1 3622702|gb| AAK34400.1 1 
gi|13622703|gb|AAK34401.1| 
gi|1 3622704|gb| AAK34402. 1 1 
gi|1 3622705|gb|AAK34403.1 1 
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gi|13622711|gb|AAK34408.1 
gi|1 362271 3|gbJAAK3441 0. 1 
gi|13622714|gb|AAK34411.1 
gi|1 362271 5|gb|AAK34412. 1 
gi|13622718|gb|AAK34414.1 
gi|13622719|gblAAK34415.1 
gj|13622720|gb|AAK34416.1 
gi|13622721|gb|AAK34417.1 
gi| 1 3622722|gb|AAK3441 8. 1 
gi|1 3622723|gb|AAK3441 9. 1 
gi|1 3622727|gb|AAK34422. 1 
gi|1 3622728|gb|AAK34423. 1 
gi|1 3622729|gb| AAK34424. 1 
gi|1 3622730|gb|AAK34425. 1 
gi|1 3622731 |gb|AAK34426. 1 
gi|1 3622733|gb| AAK34428. 1 
gi|13622734|gb|AAK34429.1 
gi|1 3622735|gb|AAK34430. 1 
gi|1 3622736|gb|AAK34431 . 1 
gi|1 3622737|gb|AAK34432. 1 
gi|13622740|gb|AAK34434.1 
gi|1 3622741 |gb|AAK34435. 1 
gi|1 3622742|gbJAAK34436. 1 
gi|13622744|gb|AAK34438.1 
gi|1 3622745|gb|AAK34439.1 
gl|1 3622746|gb|AAK34440.1 
gl|1 3622749|gb|AAK34442.1 
gi|1 3622750|gb|AAK34443.1 
gi|1 3622751 |gb|AAK34444.1 
gi|1 3622752|gb|AAK34445.1 
gl|1 3622753|gb|AAK34446.1 
gi|1 3622754|gb|AAK34447. 1 
gi|13622760|gb|AAK34452.1 
gi|1 3622762|gb|AAK34454.1 
gl|1 3622763|gb|AAK34455.1 
gi|1 3622764|gb|AAK34456.1 
gi|1 3622765|gb|AAK34457.1 
gi|1 3622766|gb|AAK34458.1 
gl|1 3622767|gb|AAK34459. 1 
gi| 1 3622768|gb|AAK34460.1 
gi|13622770|gb|AAK34462.1 
gi|1 3622771 |gb|AAK34463. 1 
gi|1 3622774|gb|AAK34465. 1 
gi|13622775|gb|AAK34466.1 
gi| 1 3622776|gb| AAK34467. 1 
gi|13622777|gb|AAK34468.1 
gl| 1 3622778|gb|AAK34469. 1 
gi|1 3622779|gb|AAK34470. 1 
gi| 1 3622780|gb|AAK34471 . 1 
gi| 1 3622781 |gb|AAK34472. 1 
gi|1 3622782|gb|AAK34473. 1 
gi|13622783|gb|AAK34474.1 
gi|13622785|gb|AAK34475.1 
gi|13622787|gb|AAK34477.1 
gi|13622789|gb|AAK34479.1 
gi|13622790|gb|AAK34480.1 
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gi| 1 3622791 |gb|AAK34481 .1 1 
g i| 1 3622792|gb| AAK34482. 1 1 
gi| 1 3622793|gb| AAK34483. 1 1 
gi| 1 3622794|gb| AAK34484. 1 1 
gi|13622795|gb|AAK34485.1| 
gi|13622796|gb|AAK34486.1| 
gi| 1 3622798|gb|AAK34487. 1 1 
gi|1 3622799|gb|AAK34488. 1 1 
gi|1 3622800|gb|AAK34489.1 1 
gi|1 3622801 |gb|AAK34490.1| 
gi| 1 3622802|gb| AAK34491 . 1 1 
gi| 1 3622803|gb| AAK34492. 1 1 
gi| 1 3622804|gb| AAK34493. 1 1 
gi| 1 3622805| gb| AAK34494. 1 1 
gi| 1 3622806|gb| AAK34495. 1 1 
gi| 1 3622807|gb| AAK34496. 1 1 
gi| 1 3622808| gb| AAK34497. 1 1 
gi|13622809|gb|AAK34498.1| 
gi| 1 362281 0|gb| AAK34499. 1 1 
gi|13622812|gb|AAK34500.1 1 
gij 1 362281 3|gb| AAK34501 . 1 1 
gi| 1 3622814|gb|AAK34502.1| 
gi| 1 362281 5|gb|AAK34503.1 1 
gi| 1 362281 8|gb| AAK34506. 1 1 
gi| 1 3622821 |gb| AAK34509. 1 1 
gi| 1 3622822|gb|AAK3451 0.1 1 
gi| 1 3622823|gb|AAK3451 1 . 1 1 
gi| 1 3622825|gb|AAK3451 2. 1 1 
gi| 1 3622826|gb| AAK3451 3. 1 1 
gi| 1 3622827|gb| AAK3451 4. 1 1 
gi| 1 3622828|gb|AAK3451 5. 1 1 
gi| 1 3622829|gb|AAK3451 6. 1 1 
gi| 1 3622830|gb|AAK3451 7. 1 1 
g i| 1 3622833|gb| AAK34520. 1 1 
gi| 1 3622838|gb|AAK34524. 1 ] 
gi| 13622839|gb|AAK34525. 1 1 
gi| 1 3622840|gb|AAK34526. 1 1 
gi| 1 3622841 |gb|AAK34527. 1 1 
gi| 1 3622847|gb|AAK34532. 1 1 
gi| 1 3622848|gb|AAK34533. 1 1 
gi|13622849|gb|AAK34534.1 
gi|13622853|gb|AAK34537.T 
gi| 1 3622854|gb|AAK34538.1 
gi| 1 3622856|gb| AAK34540. 1 
gi| 1 3622857|gb|AAK34541 . 1 
gi|13622858|gb|AAK34542.1 
gij 1 3622860|gb| AAK34543. 1 
gi| 1 3622861 |gb|AAK34544. 1 
gi|13622862|gb|AAK34545.1 
gi| 1 3622863|gb| AAK34546. 1 
gi|13622864|gb|AAK34547.1 
gi| 13622865|gb|AAK34548.1 
gi| 1 3622866|gb| AAK34549. 1 
gi| 1 3622867|gb| AAK34550. 1 
gi| 1 3622868|gb|AAK3455 1 . 1 
gi| 1 3622869|gb|AAK34552.1 
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gi|1 3622870|gb|AAK34553. 1 1 
gi|1 3622873|gb|AAK34555. 1 
gi|1 3622875|gb|AAK34557. 1 1 
gi|1 3622876|gb|AAK34558. 1 1 
gi|1 3622877 |gb|AAK34559. 1 1 
gi|13622878|gb|AAK34560.1 
gi| 1 3622879|gb|AAK34561 . 1 
gi|13622880|gb|AAK34562.1i 
gi| 1 3622881 |gb|AAK34563. 1 
gi| 1 3622882|gb|AAK34564. 1 
gi| 1 3622885|gb|AAK34566. 1 
gi|1 3622886|gb|AAK34567. 1 
gi|1 3622887|gb|AAK34568. 1 
gi|13622888|gb|AAK34569.1 
gi|1 3622890|gb|AAK34571 . 1 
gi|1 3622893|gb|AAK34574. 1 
gi|1 3622896|gb|AAK34576.1 
gi| 1 3622898|gb| AAK34578. 1 
gi|1 3622899|gb|AAK34579.1 
gi|1 3622900|gb|AAK34580.1 
gij 1 3622901 |gb| AAK34581 . 1 
gi|1 3622903|gb|AAK34583. 1 
gill 3622905|gb|AAK34585. 1 
gi|13622906|gb|AAK34586.1 
gi|1 3622907|gb|AAK34587.1 
gi|1 3622908|gb|AAK34588.1 
gi|1 362291 0|g b| AAK34589. 1 
gi|1362291 1|gb|AAK34590.1 
gi|1 362291 2|gb|AAK34591 .1 
gi|1 362291 3|gb|AAK34592.1 
gi|13622914|gb|AAK34593.1 
gi|1362291 5|gb|AAK34594.1 
gi|1 362291 7|gb|AAK34596. 1 
gi|13622918|gb|AAK34597.1 
gi|1 362291 9|gb|AAK34598.1 
gi| 1 3622921 |gb|AAK34599. 1 
gi| 1 3622922|gb|AAK34600.1 
gi|13622924|gb|AAK34602.1 
gi|1 3622925|gb|AAK34603.1 
gi|13622926|gb|AAK34604.1 
gi| 1 3622927|gb|AAK34605. 1 
gi|13622928|gb|AAK34606.1 
gi| 1 3622929|gb|AAK34607. 1 
gi| 1 3622930|gb|AAK34608. 1 
gi| 1 3622931 |gb| AAK34609. 1 
gi|13622933|gb|AAK34610.1 
gi|13622941|gb|AAK34617.1 
gi| 1 3622944|gb|AAK34620. 1 
gi|13622945|gb|AAK34621.1 
gi| 1 3622947|gb|AAK34623. 1 
gi| 1 3622948|gb|AAK34624. 1 
gi|13622949|gb|AAK34625.1 
gi|1 3622950|gb|AAK34626.1 
gi| 1 3622952|gb|AAK34627. 1 
gi| 1 3622955 |gb| AAK34630. 1 
gi| 1 3622956|gb|AAK34631 . 1 
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gi|1 3622959|gb|AAK34634.1 
gi|1 3622961 |gb|AAK34636. 1 
gi|1 3622963|gb|AAK34638.1 
gi|1 3622964|gb|AAK34639.1 
gi|13622967|gb|AAK34641.1 
gi| 1 3622969|gb| AAK34643.1 
gi|13622971|gb|AAK34645.1 
gi|1 3622973|gb|AAK34647.1 
gi|13622974|gb|AAK34648.1 
gi|1 3622977|gblAAK34651 .1 
gi| 1 3622981 |gb| AAK34654. 1 
gi|1 3622982|gbj AAK34655.1 
gf|13622983|gb|AAK34656.1 
gi|1 3622984|gb|AAK34657. 1 
gi|1 3622985|gb|AAK34658. 1 
gi|13622989|gb|AAK34661.1 
gi|13622990|gb|AAK34662.1 
gi|1 3622991 |gb|AAK34663. 1 
gi|1 3622992|gb|AAK34664. 1 
gi| 1 3622995|gb| AAK34666. 1 
gi|1 3622996|gb|AAK34667. 1 
gi|1 3622998|gb|AAK34669. 1 
gi| 1 3622999|gb| AAK34670. 1 
gi| 1 3623000|gb|AAK34671 . 1 
gi|1 3623001 |gb|AAK34672. 1 
gi| 1 3623002|gb|AAK34673. 1 
gi|1 3623004|gb|AAK34674.1 
gi|1 3623005|gb|AAK34675.1 
gi| 1 3623006|gb| AAK34676 . 1 
gi|1 3623007|gb|AAK34677. 1 
gi|13623009|gb|AAK34679.1 
gi|13623019|gb|AAK34688.1 
gi|13623020|gb|AAK34689.1 
gi|13623030|gbJAAK34698.1 
gi| 1 3623031 )gb|AAK34699. 1 
gi| 1 3623032|gb|AAK34700. 1 
gi|13623033|gb|AAK34701 . 1 
gi|13623038|gb|AAK34705.1 
gi|13623045|gb|AAK34712.1 
gi|1 3623046|gb|AAK3471 3. 1 
gi| 1 3623047|gb|AAK34714. 1 
gi| 1 3623049|gb|AAK3471 5. 1 
gi| 1 3623050|gb|AAK3471 6. 1 
gi|13623051|gb|AAK34717.1 
gi|1 3623052|gb|AAK34718.1 
gi|13623053|gb|AAK34719.1 
gi|13623054|gb|AAK34720.1 
gi|13623056|gb|AAK34722.1 
gi|13623058|gb|AAK34724.1 
gi| 1 3623062|gb|AAK34727. 1 
gi 1 1 3623064|gb|AAK34729. 1 
gi|13623065|gb|AAK34730.1 
gi|13623069|gb|AAK34733.1 
gl|1 3623074|gb|AAK34738. 1 
gi|13623081|gb|AAK34744.1 
gi|13623082|gb|AAK34745.1 
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gi|1 3623083|gb|AAK34746.1 
gi| 1 3623085|gb|AAK34747.1 
gi|1 3623086|gb|AAK34748.1 
gi| 1 3623088|gb| AAK34750. 1 
gi|13623089|gb|AAK34751.1 
gi|13623090|gb|AAK34752.1 
gi|13623091 |gb|AAK34753.1 
gi|13623093|gb|AAK34755.1 
gi| 1 3623095|gb|AAK34756.1 
gi| 1 3623096|gb|AAK34757.1 
gi| 1 3623098|gb|AAK34759.1 
gi|13623099|gb|AAK34760.1 
gi|13623100|gb|AAK34761.1 
gi|13623102|gb|AAK34763.1 
Si| 1 36231 03|gb|AAK34764.1 
gi|13623105|gb|AAK34766.1 
gi|1 3623107|gb|AAK34767.1 
gi|13623128|gb|AAK34787.1 
gi|1 36231 29|gb|AAK34788.1 
gi|1 36231 31 |gb|AAK34790.1 
gi|13623132|gb|AAK34791.1 
gi|13623133|gb|AAK34792.1 
gi|13623134|gb|AAK34793.1 
gi|13623136|gb|AAK34794.1 
gi|13623138|gb|AAK34796.1 
gi|1 36231 39|gb|AAK34797.1 
gi|13623150|gb|AAK34807.1 
gi|13623151|gb|AAK34808.1 
gi|13623152|gb|AAK34809.1 
gi|1 3623154|gb|AAK3481 1 .1 
gi|13623155|gb|AAK34812.1 
gi|13623156|gb|AAK34813.1 
gi|1 36231 57|gb|AAK34814.1 
gi|13623159|gb|AAK34815.1 
gi| 1 3623161 |gb|AAK3481 7.1 
gi| 1 3623 1 62|gb| AAK3481 8.1 
gi|1 3623163|gb|AAK3481 9.1 
gi|13623165|gb|AAK34821.1 
gi| 1 36231 66|gb| AAK34822.1 
gi|13623167|gb|AAK34823.1 
gi|13623168|gb|AAK34824.1 
gi| 1 3623170|gb|AAK34826.1 
gi|1 3623171 |gb|AAK34827.1 
gi|1 3623175|gb|AAK34830.1 
gi|13623176|gb|AAK34831.1 
gi|13623177|gb|AAK34832.1 
gi|13623179|gb|AAK34834.1 
gi|13623180|gb|AAK34835.1 
gi| 1 36231 82|gb|AAK34836. 1 
gi|13623183|gb|AAK34837.1 
gi| 1 3623184|gb|AAK34838.1 
gi|13623185|gb|AAK34839.1 
gi|13623186|gb|AAK34840.1 
gi| 1 36231 87|gb| AAK34841 .1 
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Ie34: List of GAS ORF's which are shar d with GBS but which d nothav 

homologues with Spn 

gi| 1 3621 381 |gb| AAK331 95. 1 1 
gi|13621423|gb|AAK33233.1| 
gi|13621440|gb|AAK33249.1| 
giJ13621443|gb|AAK33251.1| 
gi|1 3621453|gb|AAK33260. 1 1 
gi| 1 3621 454|gb| AAK3326 1 . 1 1 
gi|1 3621 479|gb|AAK33284.1 1 
gi|1 3621 482|gb| AAK33287. 1 1 
gi|13621492|gb|AAK33296.1| 
gi| 1 3621 493|gb| AAK33297. 1 1 
gi| 1 3621 497|gb| AAK33300. 1 1 
gi| 1 3621 498|gb| AAK3330 1 . 1 1 
gi|1 3621 512|gb|AAK33314. 1 1 
gi|13621514|gb|AAK33316.1| 
gi|1 3621 556|gb| AAK33354. 1 1 
gi| 1 3621 570|gb|AAK33366.1 1 
gi|1 3621 587|gb|AAK33382. 1 1 
gi| 1 3621 61 0|gb| AAK33403.1 1 
gi|1362161 3|gb|AAK33405.1 1 
gi|1 3621 626|gb|AAK3341 8.1 1 
gi| 1 3621 632|gb|AAK33423. 1 1 
gi|13621635|gb|AAK33426.1 1 
gi| 1 3621643|gb|AAK33433.1 1 
gi|13621 655|gb| AAK33444. 1 1 
gi| 1 3621 656|gb|AAK33445. 1 1 
gi|13621659|gb|AAK33448.1| 

gi| 1 3621 673|gb| AAK33461 . 1 1 

gi| 1 3621 686|gb|AAK33473. 1 1 

gi| 1 3621 696|gb| AAK33482. 1 1 

gi|13621703|gb|AAK33488.1| 

gi| 1 3621 71 2|gb|AAK33497. 1 1 

gi| 1 3621 728|gb| AAK3351 1 . 1 1 

gil 1 3621 738|gb|AAK33520. 1 1 

gi|13621739|gb|AAK33521 .1| 

gi 1 1 3621 740|gb| AAK33522. 1 1 

gi| 1 3621 772|gb|AAK33551 . 1 1 

gi| 1 3621 776|gb|AAK33555. 1 1 

gi| 1 3621 791 |gb|AAK33569. 1 1 

gi| 1 3621 798lgb|AAK33575. 1 1 

gi( 1 3621 801 |gb|AAK33578. 1 1 

gi| 1 3621 803|gb| AAK33580. 1 1 

gi| 1 3621 804|gb| AAK33581 . 1 1 

gi| 1 3621 832|gb| AAK33606. 1 1 

gi| 1 3621 833|gb| AAK33607. 1 1 

gi| 1 3621 896|gb| AAK33665. 1 1 

gi| 1 3621 897|gb| AAK33666. 1 1 

gi| 1 3621 906|gb|AAK33674. 1 1 

gi| 1 3621 91 1 |gb| AAK33679. 1 1 

gi|1 3621 949|gb|AAK3371 3. 1 1 

gi 1 1 3621 951 |gb|AAK3371 5. 1 1 

gi| 1 3621 962|gb| AAK33724. 1 1 

gi| 1 362 1 963|gb|AAK33725. 1 1 

gi| 1 3621 964|gb| AAK33726. 1 1 

gi| 1 3621 971 |gb|AAK33732. 1 1 

git 1 3621 976|gb|AAK33737. 1 1 

gi| 1 3621 983|gb|AAK33744. 1 1 
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Ie34: List of G. 



& m^u^^m y o« a & o a 

ORF's which are shared with GBS but which d not hav 
homol gues with Spn 

gi|13621988|gb|AAK33748.1 
gi!13622014|gb|AAK33772.1 
gi| 1 362201 5|gb| AAK33773.1 j 
gi| 1 3622022|gb| AAK33779.1 
gi|1 3622023|gb|AAK33780.1 j 
gi|13622028|gb|AAK33784.1| 
gi|1 3622029|gb|AAK33785.1 
gi| 1 3622037|gb|AAK33792. 1 
gi|1 3622038|gb|AAK33793.1 
gi|1 3622040|gb|AAK33795.1 
gi|1 3622057|gb|AAK3381 1 .1 
gi|1 3622061 |gb|AAK33814.1 
gi|13622063|gb|AAK33816.1 
gi| 1 3622066|gb| AAK3381 9. 1 
gi| 1 3622067|gb| AAK33820. 1 j 
gi|13622076|gb|AAK33828.1 
gi|13622078|gb|AAK33830.1 
gi| 1 3622084|gb| AAK33835. 1 
gi|13622098|gb|AAK33848.1 
gi| 1 3622099|gb|AAK33849.1 
gi| 1 36221 00|gb|AAK33850.1 
gi|13622104|gbIAAK33854.lj 
gi|13622110|gb|AAK33859.1 
gi|13622116|gb|AAK33865.1 
gi|13622124|gb|AAK33873.1 
gi| 1 36221 59|gb|AAK33905.1 j 
gi| 1 36221 93|gb| AAK33936. 1 
gi|13622194|gb|AAK33937.1 
gi|1 36221 95|gb|AAK33938.1 
gi|13622196|gb|AAK33939.1 
gi| 1 3622202|gb| AAK33944. 1 
gi| 1 3622203|gb| AAK33945. 1 
gi| 1 3622206|gb| AAK33948. 1 
gi| 1 362221 0|g b| AAK33951 . 1 
gi|13622221 |gb|AAK33961.1 
gi|13622231 |gb|AAK33971.1 
gi|1 3622234|gb|AAK33973.1 
gi|1 3622238|gb|AAK33977.1 
gi| 1 3622245|gb| AAK33984. 1 
gi| 1 3622246|gbjAAK33985.1 
gi|13622248|gb|AAK33986.1 
gi|13622249|gb|AAK33987.1 
gi|1 3622251 |gb|AAK33989.1 
gi|13622254|gb|AAK33992.1 
gi|13622267|gb|AAK34004.1 
gi|13622291 |gb|AAK34026.1 
gi| 1 3622302|gb|AAK34036. 1 
gi|1 3622303|gb|AAK34037.1 j 
gi| 1 3622304|gb| AAK34038. 1 1 
gi|13622327|gb|AAK34059.1 
gi|13622344|gb|AAK34074.lj 
gi|1 3622345|gb|AAK34075.1 
gi| 1 3622346|gb| AAK34076. 1 j 
g i| 1 3622347|gb| AAK34077. 1 
gi|1 3622348|gbJAAK34078.1 
gi| 1 3622349|gb|AAK34079. 1 j 
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^^le 34: List f GAS ORF's which are shared with GBS biKhich d not hav 



homologu s with Spn 

gi|13622382|gb|AAK34109.1 1 
gi|13622386|gb|AAK34113.1| 
gi| 1 3622391 |gb| AAK341 1 8. 1 1 
gi|13622392|gb|AAK341 19.1 1 
gi|13622397|gb|AAK34123.1| 
gi|13622404|gb|AAK34129.1| 
gi|13622412|gb|AAK341 36.1 1 
gi|13622413|gb|AAK34137.1 1 
gi|13622414|gb|AAK34138.1| 
gi|13622418|gb|AAK34142.1| 
gi|13622430|gb|AAK34152.1 1 
gi|13622446Jgb|AAK34167.1| 
gi|13622449|gb|AAK34169.1 1 
gi|13622453|gb|AAK34173.1 1 
gi|13622470]gb|AAK34188.1| 
gi|1 3622487jgb|AAK34204. 1 1 
gi| 1 3622490|gb| AAK34206. 1 1 
gi|13622502|gb|AAK34217.1| 
gi| 1 3622503|gb|AAK3421 8. 1 1 
gi|1 362251 4|gb|AAK34228. 1 1 
gi|13622528|gb|AAK34241.1| 
gi|1 3622540lgb|AAK34252. 1 j 
gi|1 3622541 |gb|AAK34253. 1 1 
gi|1 3622544|gb|AAK34255. 1 1 
gi|1 3622545|gb|AAK34256. 1 1 
gi|13622546|gb|AAK34257.1| 
gi|1 3622547|gb| AAK34258. 1 1 
gi| 1 3622548|gb| AAK34259. 1 1 
gi| 1 3622550|g b| AAK34281 . 1 1 
gi| 1 3622551 |gb|AAK34262. 1 1 
gi| 1 3622552|gb| AAK34263. 1 1 
gi|13622556|gb|AAK34267.1| 
gi| 1 3622557|gb|AAK34268. 1 1 
gi|1 3622558|gb|AAK34269. 1 1 
gi|1 3622559|gb| AAK34270. 1 1 
gi|13622563|gb|AAK34273.1| 
gi| 1 3622571 |gb| AAK34281 . 1 1 
gi|13622576|gb|AAK34286.1| 
gi|1 3622581 |gb|AAK34290. 1 1 
gi|13622582|gb|AAK34291.1| 
gi|13622586|gbIAAK34295.1| 
gi|13622589|gb|AAK34298.1| 
gi|13622605|gb|AAK34312.1| 
gi|13622633|gb|AAK34338.1| 
gi|1 3622635|gb| AAK34340. 1 1 
gi|13622637|gb|AAK34342.1| 
gi|13622638|gb|AAK34343.1| 
gi|1 3622657|gb|AAK34360.1 j 
gi|1 3622707|gb!AAK34404.1 1 
gi|13622716|gb|AAK34413.1| 
gi| 1 3622724|gb| AAK34420. 1 1 
gi|13622732|gb|AAK34427.1 1 
gi|13622743|gb|AAK34437.1| 
gi|13622761|gb|AAK34453.1| 
gi| 1 3622773|gb| AAK34464. 1 1 
gi|13622788|gb|AAK34478.1| 
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I 34: Ust fG 



ORF's which are shared with GBS 
homologues with Spn 

gi| 1 362281 6|gb|AAK34504. 1 
gi|13622817|gb|AAK34505.1 
gi| 1 3622846|gb|AAK34531 .1 
gi| 1 3622852|gb| AAK34536. 1 
gi|13622874|gbJAAK34556.1 
gi|13622889|gb|AAK34570.1 
gill 3622891 |gb|AAK34572. 1 
gi| 1 3622892|gb| AAK34573. 1 
gi| 1 3622897|gb|AAK34577. 1 
gi|13622902|gb|AAK34582.1 
gi| 1 3622904|gb|AAK34584. 1 
gi|1 3622916|gb|AAK34595.1 
gi|13622923|gb|AAK34601 .1 
gi|13622934|gb|AAK34611.1 
gi|13622953|gb|AAK34628.1 
gill 3622954|gb| AAK34629. 1 
gi| 1 3622960|gb| AAK34635. 1 
gi|13622968|gb|AAK34642.1 
gill 3622980|gb|AAK34653. 1 
gi|1 3622987|gb|AAK34659. 1 
gi|1 362301 2|gb|AAK34682. 1 
gi| 1 362301 3|gb|AAK34683. 1 
gi|1 3623014|gb|AAK34684. 1 
gi| 1 362301 5|gb|AAK34685. 1 
gi| 1 362301 6|gbl AAK34686. 1 
gi|1 362301 8|gblAAK34687. 1 
gi}1 3623022|gblAAK34691 . 1 
gil 1 3623029|gb| AAK34697. 1 
gi| 1 3623037|gb| AAK34704. 1 
gi|13623055|gb|AAK34721 .1 
gi|1 3623060|gbIAAK34725. 1 
gil 1 3623061 |gb|AAK34726. 1 
gi| 1 3623063|gb| AAK34728. 1 
gi|13623066|gb|AAK34731 .1 
gi|13623068|gb|AAK34732.1 
gi|13623092|gb|AAK34754.1 
gi| 1 3623097|gb|AAK34758. 1 
gi|13623104|gb|AAK34765.1 
gi|1 36231 26lgb|AAK34785.1 
gi|1 36231 30|gb|AAK34789.1 
gi|1 36231 37|gb|AAK34795. 1 
gi| 1 36231 53|gb| AAK348 1 0. 1 
gi|1 36231 64|gb|AAK34820.1 
gi|1 36231 78|gb| AAK34833. 1 



1# 

> but w 



7 .QSSSffi 
which do not hav 



4' 



__. . ichhav horn logu swithpneum coccus but which do not 



hav h m I gu s with GBS 

gi| 1 3621 338|gb| AAK331 57. 1 1 
gi|13621352|gb|AAK33168.1| 
gi|13621410|gb|AAK33221.1| 
gi|13621433|gb|AAK33242.1| 
gi|13621445|gb|AAK33253.1 1 
gi|13621446|gb|AAK33254.1| 
gi|13621447|gb|AAK33255.1| 
gi|13621448|gb|AAK33256.1| 
gi|1362144g|gb|AAK33257.1| 
gi|13621451|gb|AAK33259.1| 
gi|13621460|gb|AAK33267.1| 
gi|13621466|gb|AAK33272.1| 
gi|13621489|gb|AAK33293.1| 
gi|13621490|gb|AAK33294.1| 
gi| 13621 51 9|gb|AAK33320.1 1 
gi| 13621 520|gb]AAK33321.1| 
gi| 13621 653|gb|AAK33443.1 1 
gi|13621722|gblAAK33506.1| 
gi|13621723|gb|AAK33507.1| 
gi|1 3621724|gb|AAK33508.1 1 
gi| 1 3621 805|gb| AAK33582. 1 1 
gi|13621900|gb|AAK33669.1| 
gi|13622011|gb|AAK33769.1| 
gi|1 362221 2|gb|AAK33953. 1 1 
gi|13622280lgb|AAK34016.1| 
gi| 1 3622381 |gb|AAK341 08.1 1 
gi|13622409|gb|AAK34134.1| 
gi|13622410|gb|AAK34135.1| 
gi|13622423|gb|AAK34146.1| 
gi|13622428jgb|AAK34151.1| 
gi|13622441|gb|AAK34162.1| 
gi| 1 3622442|gb| AAK341 63. 1 1 
gi|13622454|gb|AAK34174.1| 
gi|13622456|gb|AAK34176.1| 
gi|1 362261 9|gb|AAK34325.1 1 
g"t|13622642|gb|AAK34346.1| 
gi|13622643|gb|AAK34347.1| 
gi|13622664|gb|AAK34366.1| 
gi| 1 3622666|gb| AAK34368. 1 1 
gi|13622667|gb|AAK34369.1| 
gi| 1 3622671 |gb| AAK34372. 1 1 
gi 1 1 3622672|gb| AAK34373. 1 1 
gi| 1 3622673|gb| AAK34374. 1 1 
gi| 1 3622674|gb| AAK34375. 1 1 
gi|13622679|gblAAK34380.1| 
gi| 1 3622680|gb| AAK34381 . 1 1 
gi|1 3622682|gb|AAK34382.1 1 
gi|1 3622755|gblAAK34448. 1 1 
gi|13622758|gb|AAK34450.1| 
gill 3622759|gb|AAK34451 .1 1 
gi|1 3622835|gb|AAK34521 .1 1 
gi 1 1 3622837|gb|AAK34523. 1 1 
gi|13622937|gb|AAK34614.1| 
gi|1 3622942|gb|AAK3461 8.1 1 
gi|13622946|gb|AAK34622.1| 
gi| 1 3622978|gb|AAK34652. 1 1 
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ch hav homologues with pneum crocus but which do not 
hav h mologu s with GBS 



gi|13623027|gb|AAK34695.1 1 
gi|13623087|gb|AAK34749.1 1 
gi|1 36231 01 |gb| AAK34762.1 1 
gi|1 36231 44|gb|AAK34802.1 1 
gi|13623146lgb|AAK34804.1| 
gi|13623147|gb|AAK34805.1| 
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Table 36: Sp^>RF's which share homologues withQBS and GAS. 



SP0001 

SP0002 

SP0003 

SP0004 

SP0005 

SP0006 

SP0007 

SP0008 

SP0010 

SP0011 

SP0013 

SP0014 

SP0019 

SP0021 

SP0024 

SP0027 

SP0032 

SP0033 

SP0034 

SP0035 

SP0036 

SP0037 

SP0042 

SP0044 

SP0045 

SP0046 

SP0047 

SP0048 

SP0051 

SP0053 

SP0054 

SP0056 

SP0063 

SP0073 

SP0074 

SP0078 

SP0079 

SP0083 

SP0084 

SP0085 

SP0095 

SP0105 

SP0106 

SP0111 

SP0112 

SP0118 

SP0120 

SP0121 

SP0122 

SP0127 

SP0128 

SP0129 

SP0148 

SP0149 

SP0151 

SP0152 
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Tabl 36: Spn ORF's which share h m logu s with GBS and GAS. 



SP0158 

SP0173 

SP0179 

SP0180 

SP0184 

SP0185 

SP0186 

SP0187 

SP0189 

SP0192 

SP0194 

SP0197 

SP0199 

SP0202 

SP0204 

SP0205 

SP0208 

SP0209 

SP0210 

SP0211 

SP0212 

SP0213 

SP0214 

SP0215 

SP0216 

SP0217 

SP0218 

SP0219 

SP0220 

SP0221 

SP0222 

SP0224 

SP0225 

SP0226 

SP0227 

SP0228 

SP0229 

SP0230 

SP0231 

SP0232 

SP0233 

SP0234 

SP0235 

SP0236 

SP0240 

SP0242 

SP0243 

SP0245 

SP0246 

SP0247 

SP0248 

SP0249 

SP0250 

SP0251 

SP0252 

SP0253 




Tabl 36: SprTORF's which shar horn logues with GBS and GAS. 

SP0254 
SP0259 
SP0261 
SP0262 
SP0263 
SP0264 
SP0265 
SP0266 
SP0268 
SP0271 
SP0272 
SP0273 
SP0274 
SP0280 
SP0281 
SP0282 
SP0283 
SP0284 
SP0285 
SP0286 
SP0287 
SP0289 
SP0290 
SP0291 
SP0292 
SP0294 
SP0295 
SP0303 
SP0310 
SP0314 
SP0317 
SP0318 
SP0319 
SP0320 
SP0321 
SP0322 
SP0323 
SP0324 
SP0325 
SP0327 
SP0330 
SP0334 
SP0336 
SP0337 
SP0338 
SP0340 
SP0342 
SP0369 
SP0370 
SP0371 
SP0373 
SP0374 
SP0381 
SP0382 
SP0383 
SP0384 
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Tabl 36: Spn ORF's which share horn logu s with GBS and GAS. 



SP0385 

SP0386 

SP0387 

SP0400 

SP0401 

SP0402 

SP0403 

SP0404 

SP0405 

SP0406 

SP0408 

SP0410 

SP0411 

SP0412 

SP0415 

SP0416 

SP0417 

SP0418 

SP0419 

SP0420 

SP0421 

SP0422 

SP0423 

SP0424 

SP0425 

SP0426 

SP0427 

SP0433 

SP0434 

SP0435 

SP0436 

SP0437 

SP0438 

SP0439 

SP0441 

SP0442 

SP0443 

SP0452 

SP0453 

SP0454 

SP0457 

SP0458 

SP0459 

SP0461 

SP0466 

SP0467 

SP0474 

SP0477 

SP0478 

SP0483 

SP0486 

SP0488 

SP0489 

SP0493 

SP0494 

SP0499 
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Tabl 36: Sp^ORF's which shar h mologues with GBS and GAS. 



SP0500 
SP0501 
SP0502 
SP0515 
SP0516 
SP0517 
SP0519 
SP0521 
SP0522 
SP0523 
SP0526 
SP0549 
SP0550 
SP0552 
SP0553 
SP0554 
SP0555 
SP0556 
SP0557 
SP0563 
SP0567 
SP0568 
SP0576 
SP0577 
SP0578 
SP0579 
SP0581 
SP0588 
SP0589 
SP0591 
SP0592 
SP0593 
SP0603 
SP0604 
SP0605 
SP0608 
SP0610 
SP0611 
SP0613 
SP0614 
SP0615 
SP0616 
SP0618 
SP0620 
SP0622 
SP0623 
SP0624 
SP0626 
SP0630 
SP0631 
SP0636 
SP0637 
SP0638 
SP0645 
SP0646 
SP0647 
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a |B 7 .082602 
Table 36: Spn ORF's which shar homologues with GBS and GAS. 



SP0787 

SP0788 

SP0792 

SP0793 

SP0797 

SP0798 

SP0799 

SP0801 

SP0802 

SP0803 

SP0805 

SP0806 

SP0807 

SP0816 

SP0817 

SP0820 

SP0822 

SP0823 

SP0824 

SP0825 

SP0828 

SP0829 

SP0831 

SP0835 

SP0837 

SP0838 

SP0839 

SP0841 

SP0843 

SP0844 

SP0845 

SP0846 

SP0847 

SP0848 

SP0851 

SP0852 

SP0855 

SP0856 

SP0862 

SP0864 

SP0865 

SP0867 

SP0868 

SP0869 

SP0870 

SP0871 

SP0872 

SP0873 

SP0875 

SP0876 

SP0877 

SP0878 

SP0880 

SP0881 

SP0893 

SP0894 
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Tabl 36: Spn ORF's which share homologues with GBS and GAS. - 



SP0895 

SP0896 

SP0897 

SP0904 

SP0905 

SP0908 

SP0909 

SP0912 

SP0923 

SP0927 

SP0928 

SP0929 

SP0931 

SP0932 

SP0933 

SP0935 

SP0936 

SP0937 

SP0938 

SP0943 

SP0944 

SP0945 

SP0946 

SP0947 

SP0948 

SP0954 

SP0955 

SP0959 

SP0960 

SP0961 

SP0962 

SP0964 

SP0966 

SP0967 

SP0968 

SP0969 

SP0970 

SP0971 

SP0972 

SP0974 

SP0975 

SP0976 

SP0978 

SP0979 

SP0980 

SP0981 

SP0984 

SP0985 

SP0987 

SP0988 

SP0989 

SP0991 

SP0992 

SP0993 

SP1002 

SP1003 
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Table 36: S^TORF's which share homologues witheBS and GAS. 



SP1004 
SP1008 
SP1010 
SP1012 
SP1016 
SP1017 
SP1018 
SP1020 
SP1021 
SP1022 
SP1024 
SP1025 
SP1026 
SP1029 
SP1033 
SP1034 
SP1035 
SP1045 
SP1056 
SP1067 
SP1068 
SP1069 
SP1070 
SP1071 
SP1072 
SP1073 
SP1074 
SP1076 
SP1079 
SP1081 
SP1082 
SP1083 
SP1084 
SP1087 
SP1088 
SP1089 
SP1090 
SP1093 
SP1094 
SP1095 
SP1096 
SP1097 
SP1098 
SP1099 
SP1100 
SP1102 
SP1105 
SP1106 
SP1107 
SP1110 
SP1111 
SP1112 
SP1113 
SP1114 
SP1115 
SP1116 
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Tabl 36: Spn ORF's which share homologues with SBs and GAS. 



SP1117 

SP1118 

SP1119 

SP1128 

SP1151 

SP1152 

SP1155 

SP1156 

SP1157 

SP1159 

SP1160 

SP1161 

SP1162 

SP1163 

SP1164 

SP1167 

SP1168 

SP1169 

SP1174 

SP1175 

SP1176 

SP1177 

SP1178 

SP1179 

SP1180 

SP1182 

SP1184 

SP1185 

SP1187 

SP1190 

SP1191 

SP1192 

SP1193 

SP1197 

SP1200 

SP1202 

SP1204 

SP1205 

SP1207 

SP1208 

SP1212 

SP1213 

SP1218 

SP1219 

SP1220 

SP1225 

SP1226 

SP1227 

SP1228 

SP1229 

SP1230 

SP1231 

SP1232 

SP1233 

SP1238 

SP1241 



10. 



>RF's which share homol gues wit^GBi 



Tabl 36: SjWORF's which share homol gues witrTCBS and GAS. 

SP1242 
SP1244 
SP1245 
SP1246 
SP1247 
SP1248 
SP1249 
SP1260 
SP1263 
SP1266 
SP1275 
SP1276 
SP1277 
SP1278 
SP1279 
SP1280 
SP1283 
SP1284 
SP1285 
SP1286 
SP1287 
SP1288 
SP1289 
SP1290 
SP1291 
SP1293 
SP1297 
SP1298 
SP1299 
SP1308 
SP1316 
SP1324 
SP1329 
SP1330 
SP1331 
SP1336 
SP1341 
SP1354 
SP1355 
SP1357 
SP1358 
SP1359 
SP1362 
SP1368 
SP1370 
SP1371 
SP1372 
SP1374 
SP1375 
SP1376 
SP1377 
SP1378 
SP1380 
SP1381 
SP1383 
SP1386 
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RF's which share h mologu s with^BS and GAS. 



SP1387 

SP1388 

SP1389 

SP1390 

SP1393 

SP1394 

SP1395 

SP1396 

SP1397 

SP1398 

SP1399 

SP1400 

SP1402 

SP1403 

SP1404 

SP1405 

SP1408 

SP1407 

SP1408 

SP1409 

SP1411 

SP1412 

SP1413 

SP1414 

SP1415 

SP1416 

SP1420 

SP1421 

SP1427 

SP1428 

SP1429 

SP1434 

SP1435 

SP1445 

SP1446 

SP1448 

SP1449 

SP1450 

SP1452 

SP1453 

SP1456 

SP1457 

SP1458 

SP1460 

SP1461 

SP1462 

SP1465 

SP1466 

SP1469 

SP1470 

SP1473 

SP1474 

SP1475 

SP1478 

SP1479 - 

SP1482 
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Tabl 36: S^ffbRF's which share homologu s with GBS and GAS. 



SP1483 

SP1485 

SP1489 

SP1491 

SP1498 

SP1500 

SP1501 

SP1502 

SP1504 

SP1505 

SP1507 

SP1508 

SP1509 

SP1510 

SP1511 

SP1512 

SP1513 

SP1517 

SP1518 

SP1519 

SP1521 

SP1522 

SP1523 

SP1529 

SP1530 

SP1534 

SP1535 

SP1536 

SP1537 

SP1538 

SP1539 

SP1540 

SP1541 

SP1542 

SP1544 

SP1547 

SP1549 

SP1551 

SP1552 

SP1553 

SP1554 

SP1557 

SP1558 

SP1559 

SP1560 

SP1561 

SP1563 

SP1564 

SP1565 

SP1566 

SP1568 

SP1569 

SP1571 

SP1574 

SP1575 

SP1577 
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Tabl 36: Spn ORF's which share homologues with GBS and GAS. 

SP1580 

SP1583 

SP1584 

SP1S86 

SP1587 

SP1588 

SP1589 

SP1590 

SP1591 

SP1597 

SP1598 

SP1599 

SP1602 

SP1603 

SP1606 

SP1608 

SP1609 

SP1610 

SP1615 

SP1616 

SP1617 

SP1624 

SP1625 

SP1626 

SP1631 

SP1633 

SP1638 

SP1644 

SP1645 

SP1646 

SP1647 

SP1648 

SP1649 

SP1650 

SP1652 

SP1653 

SP1655 

SP1659 

SP1661 

SP1662 

SP1664 

SP1665 

SP1666 

SP1667 

SP1668 

SP1670 

SP1671 

SP1672 

SP1674 

SP1675 

SP1676 

SP1677 

SP1681 

SP1682 

SP1683 

SP1684 
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Table 36: Spn ORF's which shar h 



mologues with GbS and GAS. 



SP1685 

SP1688 

SP1689 

SP1697 

SP1698 

SP1699 

SP1702 

SP1709 

SP1711 

SP1712 

SP1713 

SP1714 

SP1717 

SP1721 

SP1722 

SP1724 

SP1725 

SP1726 

SP1727 

SP1732 

SP1733 

SP1734 

SP1735 

SP1736 

SP1737 

SP1738 

SP1739 

SP1742 

SP1743 

SP1744 

SP1746 

SP1747 

SP1748 

SP1749 

SP1750 

SP1752 

SP1759 

SP1776 

SP1780 

SP1781 

SP1782 

SP1785 

SP1790 

SP1795 

SP1799 

SP1804 

SP1816 

SP1817 

SP1825 

SP1839 

SP1840 

SP1845 

SP1847 

SP1848 

SP1851 

SP1855 
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Tabl 36: Spn ORFs which share h mologu s with GBS and GAS. 



SP1857 

SP1858 

SP1860 

SP1861 

SP1865 

SP1871 

SP1873 

SP1874 

SP1875 

SP1876 

SP1877 

SP1878 

SP1879 

SP1880 

SP1881 

SP1883 

SP1884 

SP1887 

SP1888 

SP1889 

SP1890 

SP1895 

SP1896 

SP1900 

SP1901 

SP1902 

SP1903 

SP1906 

SP1908 

SP1909 

SP1916 

SP1918 

SP1922 

SP1940 

SP1942 

SP1944 

SP1953 

SP1957 

SP1960 

SP1961 

SP1963 

SP1964 

SP1966 

SP1967 

SP1968 

SP1969 

SP1970 

SP1972 

SP1973 

SP1974 

SP1975 

SP1976 

SP1979 

SP1980 

SP1981 

SP1982 
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F's which shar homologu s with Sis and GAS. 



SP1983 

SP1984 

SP1985 

SP1987 

SP1989 

SP1990 

SP1991 

SP1993 

SP1994 

SP1996 

SP1997 

SP1998 

SP1999 

SP2006 

SP2007 

SP2010 

SP2011 

SP2012 

SP2020 

SP2021 

SP2022 

SP2027 

SP2028 

SP2030 

SP2031 

SP2032 

SP2033 

SP2034 

SP2035 

SP2036 

SP2037 

SP2038 

SP2040 

SP2041 

SP2042 

SP2044 

SP2045 

SP2048 

SP2052 

SP2053 

SP2054 

SP2055 

SP2056 

SP2057 

SP2058 

SP2063 

SP2065 

SP2069 

SP2070 

SP2072 

SP2073 

SP2075 

SP2077 

SP2078 

SP2082 

SP2083 
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Tabl 36: Spn ORF's which shar homologu s with GBS and GAS. 



SP2085 

SP2086 

SP2087 

SP2088 

SP2090 

SP2091 

SP2092 

SP2094 

SP2099 

SP2100 

SP2101 

SP2106 

SP2107 

SP2108 

SP2109 

SP2110 

SP2112 

SP2113 

SP2114 

SP2119 

SP2121 

SP2129 

SP2131 

SP2135 

SP2142 

SP2148 

SP2150 

SP2151 

SP2152 

SP2153 

SP2156 

SP2161 

SP2162 

SP2169 

SP2170 

SP2171 

SP2172 

SP2173 

SP2174 

SP2175 

SP2176 

SP2184 

SP2185 

SP2186 

SP2187 

SP2188 

SP2189 

SP2191 

SP2192 

SP2193 

SP2194 

SP2195 

SP2202 

SP2203 

SP2204 

SP2205 
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Tabl 36: Spn ORF's which shar homol gu s with GBS and GAS. 



SP2206 

SP2207 

SP2208 

SP2209 

SP2210 

SP2214 

SP2215 

SP2216 

SP2219 

SP2220 

SP2221 

SP2222 

SP2224 

SP2225 

SP2226 

SP2227 

SP2228 

SP2229 

SP2230 

SP2231 

SP2233 

SP2234 

SP2235 

SP2238 

SP2239 

SP2240 
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Tabl 37: Spn ORF's which share 
homologues with GBS but do n tshar horn I gues with GAS 

SP0012 
SP0020 
SP0039 
SP0050 
SP0082 
SP0107 
SP0113 
SP0119 
SP0146 
SP0150 
SP0175 
SP0176 
SP0177 
SP0178 
SP0237 
SP0255 
SP0260 
SP0267 
SP0278 
SP0288 
SP0346 
SP0347 
SP0348 
SP0349 
SP0366 
SP0376 
SP0413 
SP0445 
SP0462 
SP0463 
SP0479 
SP0480 
SP0482 
SP0484 
SP0537 
SP0538 
SP0566 
SP0580 
SP0585 
SP0599 
SP0600 
SP0601 
SP0606 
SP0607 
SP0609 
SP0617 
SP0627 
SP0655 
SP0656 
SP0710 
SP0711 
SP0717 
SP0718 
SP0720 
SP0723 
SP0724 



1 



•7 .082602 



Tabl 37: Spn ORF's which shar 
horn logu s with GBS but d not share h m I gues with GAS 



SP0725 

SP0730 

SP0739 

SP0749 

SP0750 

SP0751 

SP0752 

SP0753 

SP0754 

SP0769 

SP0789 

SP0791 

SP0826 

SP0900 

SP0913 

SP0914 

SP0939 

SP0941 

SP0942 

SP0953 

SP0973 

SP0977 

SP1011 

SP1013 

SP1027 

SP1054 

SP1055 

SP1080 

SP1086 

SP1121 

SP1122 

SP1123 

SP1124 

SP1126 

SP1127 

SP1137 

SP1166 

SP1173 

SP1194 

SP1195 

SP1215 

SP1240 

SP1256 

SP1261 

SP1271 

SP1272 

SP1273 

SP1274 

SP1306 

SP1310 

SP1332 

SP1333 

SP1334 

SP1346 

SP1348 

SP1350 
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# 



Table 37: Spn ORF's which shar 
homologues with GBS but do not shar homol gues with GAS 

SP1360 
SP1361 
SP1365 
SP1382 
SP1384 
SP1392 
SP1447 
SP1451 
SP1463 
SP1464 
SP1471 
SP1472 
SP1524 
SP1527 
SP1600 
SP1605 
SP1607 
SP1632 
SP1634 
SP1651 
SP1673 
SP1680 
SP1695 
SP1700 
SP1701 
SP1720 
SP1729 
SP1740 
SP1741 
SP1745 
SP1751 
SP1757 
SP1758 
SP1761 
SP1762 
SP1763 
SP1764 
SP1765 
SP1766 
SP1767 
SP1768 
SP1770 
SP1771 
SP1772 
SP1783 
SP1802 
SP1828 
SP1856 
SP1867 
SP1869 
SP1870 
SP1872 
SP1891 
SP1907 
SP1910 
SP1911 



3 
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Tabl 37: Spn ORF's which shar 
homol gues with GBS but do n t share h mologues with GAS 



SP1927 

SP1928 

SP1943 

SP1959 

SP2001 

SP2002 

SP2009 

SP2026 

SP2029 

SP2039 

SP2061 

SP2064 

SP2066 

SP2079 

SP2084 

SP2095 

SP2096 

SP2098 

SP2103 

SP2127 

SP2128 

SP2130 

SP2134 

SP2137 

SP2138 

SP2157 

SP2196 
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37: Son ORF's which share homologuelwitti 



Table 37: Spn ORF's which share homologuef with GAS 
but do not share horn I gu s with GBS 



SP0065 

SP0075 

SP0090 

SP0091 

SP0092 

SP0099 

SP0100 

SP0153 

SP0155 

SP0156 

SP0200 

SP0306 

SP0313 

SP0341 

SP0476 

SP0496 

SP0509 

SP0527 

SP0648 

SP0658 

SP0659 

SP0661 

SP0677 

SP0715 

SP0742 

SP0743 

SP0858 

SP0859 

SP0860 

SP0910 

SP0986 

SP0994 

SP0999 

SP1000 

SP1001 

SP1023 

SP1075 

SP1129 

SP1147 

SP1171 

SP1186 

SP1315 

SP1317 

SP1319 

SP1320 

SP1321 

SP1322 

SP1438 

SP1442 

SP1525 

SP1546 

SP1570 

SP1572 

SP1578 

SP1604 

SP1715 



1 
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Table 37: Spn ORF's which shar homologues with GAS 
butd not shar homol gueswithGBS 

SP1754 
SP1797 
SP1798 
SP1800 
SP1885 
SP1919 
SP1923 
SP1941 
SP1950 
SP2016 
SP2017 
SP2051 
SP2060 
SP2111 
SP2143 
SP2144 
SP2201 
SP2236 
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